linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] PM / core: genpd fix and pm_runtime_force_suspend|resume() rework
@ 2018-01-12 13:00 Rafael J. Wysocki
  2018-01-12 13:10 ` [PATCH 1/2] PM / genpd: Stop/start devices without pm_runtime_force_suspend/resume() Rafael J. Wysocki
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2018-01-12 13:00 UTC (permalink / raw)
  To: Linux PM, Geert Uytterhoeven; +Cc: Ulf Hansson, LKML

Hi,

This comes from the recent discussion/testing effort that ensued after my
pm_runtime_force_suspend|resume() changes proposal:

https://marc.info/?t=151497772000004&r=1&w=2

Patch [1/2] basically is https://patchwork.kernel.org/patch/10152873/ rebased
on top of the current linux-next branch of the linux-pm.git tree (the relevant
part should be there in the linux-next tree proper ATM).  It applies on top
of https://patchwork.kernel.org/patch/10156077/ which should apply to the Linus'
tree cleanly.

Patch [2/2] is a resend of https://patchwork.kernel.org/patch/10142047/ with
a very minor changelog modification and the R-b tag from Ulf.

Geert, if possible, please test this on the Renesas systems that had the
problem with https://patchwork.kernel.org/patch/10142047/ previously and let
me know if you still see issues.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/2] PM / genpd: Stop/start devices without pm_runtime_force_suspend/resume()
  2018-01-12 13:00 [PATCH 0/2] PM / core: genpd fix and pm_runtime_force_suspend|resume() rework Rafael J. Wysocki
@ 2018-01-12 13:10 ` Rafael J. Wysocki
  2018-01-12 14:07   ` Ulf Hansson
  2018-01-12 13:12 ` [PATCH 2/2] PM / runtime: Rework pm_runtime_force_suspend/resume() Rafael J. Wysocki
  2018-01-12 14:31 ` [PATCH 0/2] PM / core: genpd fix and pm_runtime_force_suspend|resume() rework Geert Uytterhoeven
  2 siblings, 1 reply; 16+ messages in thread
From: Rafael J. Wysocki @ 2018-01-12 13:10 UTC (permalink / raw)
  To: Linux PM; +Cc: Geert Uytterhoeven, Ulf Hansson, LKML

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

There are problems with calling pm_runtime_force_suspend/resume()
to "stop" and "start" devices in genpd_finish_suspend() and
genpd_resume_noirq() (and in analogous hibernation-specific genpd
callbacks) after commit 122a22377a3d (PM / Domains: Stop/start
devices during system PM suspend/resume in genpd) as those routines
do much more than just "stopping" and "starting" devices (which was
the stated purpose of that commit) unnecessarily and may not play
well with system-wide PM driver callbacks.

First, consider the pm_runtime_force_suspend() in
genpd_finish_suspend().  If the current runtime PM status of the
device is "suspended", that function most likely does the right thing
by ignoring the device, because it should have been "stopped" already
and whatever needed to be done to deactivate it shoud have been done.
In turn, if the runtime PM status of the device is "active",
genpd_runtime_suspend() is called for it (indirectly) and (1) runs
the ->runtime_suspend callback provided by the device's driver
(assuming no bus type with ->runtime_suspend of its own), (2) "stops"
the device and (3) checks if the domain can be powered down, and then
(4) the device's runtime PM status is changed to "suspended".  Out of
the four actions above (1) is not necessary and it may be outright
harmful, (3) is pointless and (4) is questionable.  The only
operation that needs to be carried out here is (2).

The reason why (1) is not necessary is because the system-wide
PM callbacks provided by the device driver for the transition in
question have been run and they should have taken care of the
driver's part of device suspend already.  Moreover, it may be
harmful, because the ->runtime_suspend callback may want to
access the device which is partially suspended at that point
and may not be responsive.  Also, system-wide PM callbacks may
have been run already (in the previous phases of the system
transition under way) for the device's parent or for its supplier
devices (if any) and the device may not be accessible because of
that.

There also is no reason to do (3), because genpd_finish_suspend()
will repeat it anyway, and (4) potentially causes confusion to ensue
during the subsequent system transition to the working state.

Consider pm_runtime_force_resume() in genpd_resume_noirq() now.
It runs genpd_runtime_resume() for all devices with runtime PM
status set to "suspended", which includes all of the devices
whose runtime PM status was changed by pm_runtime_force_suspend()
before and may include some devices already suspended when the
pm_runtime_force_suspend() was running, which may be confusing.  The
genpd_runtime_resume() first tries to power up the domain, which
(again) is pointless, because genpd_resume_noirq() has done that
already.  Then, it "starts" the device and runs the ->runtime_resume
callback (from the driver, say) for it.  If all is well, the device
is left with the runtime PM status set to "active".

Unfortunately, running the driver's ->runtime_resume callback
before its system-wide PM callbacks and possibly before some
system-wide PM callbacks of the parent device's driver (let
alone supplier drivers) is asking for trouble, especially if
the device had been suspended before pm_runtime_force_suspend()
ran previously or if the callbacks in question expect to be run
back-to-back with their suspend-side counterparts.  It also should
not be necessary, because the system-wide PM driver callbacks that
will be invoked for the device subsequently should take care of
resuming it just fine.

[Running the driver's ->runtime_resume callback in the "noirq"
phase of the transition to the working state may be problematic
even for devices whose drivers do use pm_runtime_force_resume()
in (or as) their system-wide PM callbacks if they have suppliers
other than their parents, because it may cause the supplier to
be resumed after the consumer in some cases.]

Because of the above, modify genpd as follows:

 1. Change genpd_finish_suspend() to only "stop" devices with
    runtime PM status set to "active" (without invoking runtime PM
    callbacks for them, changing their runtime PM status and so on).

    That doesn't change the handling of devices whose drivers use
    pm_runtime_force_suspend/resume() in (or as) their system-wide
    PM callbacks and addresses the issues described above for the
    other devices.

 2. Change genpd_resume_noirq() to only "start" devices with
    runtime PM status set to "active" (without invoking runtime PM
    callbacks for them, changing their runtime PM status and so on).

    Again, that doesn't change the handling of devices whose drivers
    use pm_runtime_force_suspend/resume() in (or as) their system-wide
    PM callbacks and addresses the described issues for the other
    devices.  Devices with runtime PM status set to "suspended"
    are not started with the assumption that they will be resumed
    later, either by pm_runtime_force_resume() or via runtime PM.

 3. Change genpd_restore_noirq() to follow genpd_resume_noirq().

    That causes devices already suspended before hibernation to be
    left alone (which also is the case without the change) and
    avoids running the ->runtime_resume driver callback too early
    for the other devices.

 4. Change genpd_freeze_noirq() and genpd_thaw_noirq() in accordance
    with the above modifications.

Fixes: 122a22377a3d (PM / Domains: Stop/start devices during system PM suspend/resume in genpd)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/base/power/domain.c |   25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

Index: linux-pm/drivers/base/power/domain.c
===================================================================
--- linux-pm.orig/drivers/base/power/domain.c
+++ linux-pm/drivers/base/power/domain.c
@@ -1048,8 +1048,9 @@ static int genpd_finish_suspend(struct d
 	if (dev->power.wakeup_path && genpd_is_active_wakeup(genpd))
 		return 0;
 
-	if (genpd->dev_ops.stop && genpd->dev_ops.start) {
-		ret = pm_runtime_force_suspend(dev);
+	if (genpd->dev_ops.stop && genpd->dev_ops.start &&
+	    !pm_runtime_status_suspended(dev)) {
+		ret = genpd_stop_dev(genpd, dev);
 		if (ret) {
 			if (poweroff)
 				pm_generic_restore_noirq(dev);
@@ -1106,8 +1107,9 @@ static int genpd_resume_noirq(struct dev
 	genpd->suspended_count--;
 	genpd_unlock(genpd);
 
-	if (genpd->dev_ops.stop && genpd->dev_ops.start) {
-		ret = pm_runtime_force_resume(dev);
+	if (genpd->dev_ops.stop && genpd->dev_ops.start &&
+	    !pm_runtime_status_suspended(dev)) {
+		ret = genpd_start_dev(genpd, dev);
 		if (ret)
 			return ret;
 	}
@@ -1139,8 +1141,9 @@ static int genpd_freeze_noirq(struct dev
 	if (ret)
 		return ret;
 
-	if (genpd->dev_ops.stop && genpd->dev_ops.start)
-		ret = pm_runtime_force_suspend(dev);
+	if (genpd->dev_ops.stop && genpd->dev_ops.start &&
+	    !pm_runtime_status_suspended(dev))
+		ret = genpd_stop_dev(genpd, dev);
 
 	return ret;
 }
@@ -1163,8 +1166,9 @@ static int genpd_thaw_noirq(struct devic
 	if (IS_ERR(genpd))
 		return -EINVAL;
 
-	if (genpd->dev_ops.stop && genpd->dev_ops.start) {
-		ret = pm_runtime_force_resume(dev);
+	if (genpd->dev_ops.stop && genpd->dev_ops.start &&
+	    !pm_runtime_status_suspended(dev)) {
+		ret = genpd_start_dev(genpd, dev);
 		if (ret)
 			return ret;
 	}
@@ -1221,8 +1225,9 @@ static int genpd_restore_noirq(struct de
 	genpd_sync_power_on(genpd, true, 0);
 	genpd_unlock(genpd);
 
-	if (genpd->dev_ops.stop && genpd->dev_ops.start) {
-		ret = pm_runtime_force_resume(dev);
+	if (genpd->dev_ops.stop && genpd->dev_ops.start &&
+	    !pm_runtime_status_suspended(dev)) {
+		ret = genpd_start_dev(genpd, dev);
 		if (ret)
 			return ret;
 	}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/2] PM / runtime: Rework pm_runtime_force_suspend/resume()
  2018-01-12 13:00 [PATCH 0/2] PM / core: genpd fix and pm_runtime_force_suspend|resume() rework Rafael J. Wysocki
  2018-01-12 13:10 ` [PATCH 1/2] PM / genpd: Stop/start devices without pm_runtime_force_suspend/resume() Rafael J. Wysocki
@ 2018-01-12 13:12 ` Rafael J. Wysocki
  2018-01-12 13:59   ` Ulf Hansson
  2018-01-12 14:31 ` [PATCH 0/2] PM / core: genpd fix and pm_runtime_force_suspend|resume() rework Geert Uytterhoeven
  2 siblings, 1 reply; 16+ messages in thread
From: Rafael J. Wysocki @ 2018-01-12 13:12 UTC (permalink / raw)
  To: Linux PM; +Cc: Geert Uytterhoeven, Ulf Hansson, LKML, Lukas Wunner

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

One of the limitations of pm_runtime_force_suspend/resume() is that
if a parent driver wants to use these functions, all of its child
drivers generally have to do that too because of the parent usage
counter manipulations necessary to get the correct state of the parent
during system-wide transitions to the working state (system resume).
However, that limitation turns out to be artificial, so remove it.

Namely, pm_runtime_force_suspend() only needs to update the children
counter of its parent (if there's is a parent) when the device can
stay in suspend after the subsequent system resume transition, as
that counter is correct already otherwise.  Now, if the parent's
children counter is not updated, it is not necessary to increment
the parent's usage counter in that case any more, as long as the
children counters of devices are checked along with their usage
counters in order to decide whether or not the devices may be left
in suspend after the subsequent system resume transition.

Accordingly, modify pm_runtime_force_suspend() to only call
pm_runtime_set_suspended() for devices whose usage and children
counters are at the "no references" level (the runtime PM status
of the device needs to be updated to "suspended" anyway in case
this function is called once again for the same device during the
transition under way), drop the parent usage counter incrementation
from it and update pm_runtime_force_resume() to compensate for these
changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
---
 drivers/base/power/runtime.c |   74 +++++++++++++++++++------------------------
 1 file changed, 34 insertions(+), 40 deletions(-)

Index: linux-pm/drivers/base/power/runtime.c
===================================================================
--- linux-pm.orig/drivers/base/power/runtime.c
+++ linux-pm/drivers/base/power/runtime.c
@@ -1613,17 +1613,28 @@ void pm_runtime_drop_link(struct device
 	spin_unlock_irq(&dev->power.lock);
 }
 
+static bool pm_runtime_need_not_resume(struct device *dev)
+{
+	return atomic_read(&dev->power.usage_count) <= 1 &&
+		atomic_read(&dev->power.child_count) == 0;
+}
+
 /**
  * pm_runtime_force_suspend - Force a device into suspend state if needed.
  * @dev: Device to suspend.
  *
  * Disable runtime PM so we safely can check the device's runtime PM status and
- * if it is active, invoke it's .runtime_suspend callback to bring it into
- * suspend state. Keep runtime PM disabled to preserve the state unless we
- * encounter errors.
+ * if it is active, invoke its ->runtime_suspend callback to suspend it and
+ * change its runtime PM status field to RPM_SUSPENDED.  Also, if the device's
+ * usage and children counters don't indicate that the device was in use before
+ * the system-wide transition under way, decrement its parent's children counter
+ * (if there is a parent).  Keep runtime PM disabled to preserve the state
+ * unless we encounter errors.
  *
  * Typically this function may be invoked from a system suspend callback to make
- * sure the device is put into low power state.
+ * sure the device is put into low power state and it should only be used during
+ * system-wide PM transitions to sleep states.  It assumes that the analogous
+ * pm_runtime_force_resume() will be used to resume the device.
  */
 int pm_runtime_force_suspend(struct device *dev)
 {
@@ -1646,17 +1657,18 @@ int pm_runtime_force_suspend(struct devi
 		goto err;
 
 	/*
-	 * Increase the runtime PM usage count for the device's parent, in case
-	 * when we find the device being used when system suspend was invoked.
-	 * This informs pm_runtime_force_resume() to resume the parent
-	 * immediately, which is needed to be able to resume its children,
-	 * when not deferring the resume to be managed via runtime PM.
+	 * If the device can stay in suspend after the system-wide transition
+	 * to the working state that will follow, drop the children counter of
+	 * its parent, but set its status to RPM_SUSPENDED anyway in case this
+	 * function will be called again for it in the meantime.
 	 */
-	if (dev->parent && atomic_read(&dev->power.usage_count) > 1)
-		pm_runtime_get_noresume(dev->parent);
+	if (pm_runtime_need_not_resume(dev))
+		pm_runtime_set_suspended(dev);
+	else
+		__update_runtime_status(dev, RPM_SUSPENDED);
 
-	pm_runtime_set_suspended(dev);
 	return 0;
+
 err:
 	pm_runtime_enable(dev);
 	return ret;
@@ -1669,13 +1681,9 @@ EXPORT_SYMBOL_GPL(pm_runtime_force_suspe
  *
  * Prior invoking this function we expect the user to have brought the device
  * into low power state by a call to pm_runtime_force_suspend(). Here we reverse
- * those actions and brings the device into full power, if it is expected to be
- * used on system resume. To distinguish that, we check whether the runtime PM
- * usage count is greater than 1 (the PM core increases the usage count in the
- * system PM prepare phase), as that indicates a real user (such as a subsystem,
- * driver, userspace, etc.) is using it. If that is the case, the device is
- * expected to be used on system resume as well, so then we resume it. In the
- * other case, we defer the resume to be managed via runtime PM.
+ * those actions and bring the device into full power, if it is expected to be
+ * used on system resume.  In the other case, we defer the resume to be managed
+ * via runtime PM.
  *
  * Typically this function may be invoked from a system resume callback.
  */
@@ -1684,32 +1692,18 @@ int pm_runtime_force_resume(struct devic
 	int (*callback)(struct device *);
 	int ret = 0;
 
-	callback = RPM_GET_CALLBACK(dev, runtime_resume);
-
-	if (!callback) {
-		ret = -ENOSYS;
-		goto out;
-	}
-
-	if (!pm_runtime_status_suspended(dev))
+	if (!pm_runtime_status_suspended(dev) || pm_runtime_need_not_resume(dev))
 		goto out;
 
 	/*
-	 * Decrease the parent's runtime PM usage count, if we increased it
-	 * during system suspend in pm_runtime_force_suspend().
-	*/
-	if (atomic_read(&dev->power.usage_count) > 1) {
-		if (dev->parent)
-			pm_runtime_put_noidle(dev->parent);
-	} else {
-		goto out;
-	}
+	 * The value of the parent's children counter is correct already, so
+	 * just update the status of the device.
+	 */
+	__update_runtime_status(dev, RPM_ACTIVE);
 
-	ret = pm_runtime_set_active(dev);
-	if (ret)
-		goto out;
+	callback = RPM_GET_CALLBACK(dev, runtime_resume);
 
-	ret = callback(dev);
+	ret = callback ? callback(dev) : -ENOSYS;
 	if (ret) {
 		pm_runtime_set_suspended(dev);
 		goto out;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] PM / runtime: Rework pm_runtime_force_suspend/resume()
  2018-01-12 13:12 ` [PATCH 2/2] PM / runtime: Rework pm_runtime_force_suspend/resume() Rafael J. Wysocki
@ 2018-01-12 13:59   ` Ulf Hansson
  2018-01-13  0:41     ` Rafael J. Wysocki
  0 siblings, 1 reply; 16+ messages in thread
From: Ulf Hansson @ 2018-01-12 13:59 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PM, Geert Uytterhoeven, LKML, Lukas Wunner

On 12 January 2018 at 14:12, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> One of the limitations of pm_runtime_force_suspend/resume() is that
> if a parent driver wants to use these functions, all of its child
> drivers generally have to do that too because of the parent usage
> counter manipulations necessary to get the correct state of the parent
> during system-wide transitions to the working state (system resume).
> However, that limitation turns out to be artificial, so remove it.

According to my comment on the other thread, this stands true in case
the child is managed by runtime PM as well.

Otherwise this looks good to me.

>
> Namely, pm_runtime_force_suspend() only needs to update the children
> counter of its parent (if there's is a parent) when the device can
> stay in suspend after the subsequent system resume transition, as
> that counter is correct already otherwise.  Now, if the parent's
> children counter is not updated, it is not necessary to increment
> the parent's usage counter in that case any more, as long as the
> children counters of devices are checked along with their usage
> counters in order to decide whether or not the devices may be left
> in suspend after the subsequent system resume transition.
>
> Accordingly, modify pm_runtime_force_suspend() to only call
> pm_runtime_set_suspended() for devices whose usage and children
> counters are at the "no references" level (the runtime PM status
> of the device needs to be updated to "suspended" anyway in case
> this function is called once again for the same device during the
> transition under way), drop the parent usage counter incrementation
> from it and update pm_runtime_force_resume() to compensate for these
> changes.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
> ---
>  drivers/base/power/runtime.c |   74 +++++++++++++++++++------------------------
>  1 file changed, 34 insertions(+), 40 deletions(-)
>
> Index: linux-pm/drivers/base/power/runtime.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/runtime.c
> +++ linux-pm/drivers/base/power/runtime.c
> @@ -1613,17 +1613,28 @@ void pm_runtime_drop_link(struct device
>         spin_unlock_irq(&dev->power.lock);
>  }
>
> +static bool pm_runtime_need_not_resume(struct device *dev)
> +{
> +       return atomic_read(&dev->power.usage_count) <= 1 &&
> +               atomic_read(&dev->power.child_count) == 0;

How about adding an additional patch on top taking into account the
ignore_children flag and folding that into the series, kind of as you
also suggested?

My point is, we might as well take the opportunity to fix this right
away, don't you think?

[...]

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] PM / genpd: Stop/start devices without pm_runtime_force_suspend/resume()
  2018-01-12 13:10 ` [PATCH 1/2] PM / genpd: Stop/start devices without pm_runtime_force_suspend/resume() Rafael J. Wysocki
@ 2018-01-12 14:07   ` Ulf Hansson
  0 siblings, 0 replies; 16+ messages in thread
From: Ulf Hansson @ 2018-01-12 14:07 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PM, Geert Uytterhoeven, LKML

On 12 January 2018 at 14:10, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> There are problems with calling pm_runtime_force_suspend/resume()
> to "stop" and "start" devices in genpd_finish_suspend() and
> genpd_resume_noirq() (and in analogous hibernation-specific genpd
> callbacks) after commit 122a22377a3d (PM / Domains: Stop/start
> devices during system PM suspend/resume in genpd) as those routines
> do much more than just "stopping" and "starting" devices (which was
> the stated purpose of that commit) unnecessarily and may not play
> well with system-wide PM driver callbacks.
>
> First, consider the pm_runtime_force_suspend() in
> genpd_finish_suspend().  If the current runtime PM status of the
> device is "suspended", that function most likely does the right thing
> by ignoring the device, because it should have been "stopped" already
> and whatever needed to be done to deactivate it shoud have been done.
> In turn, if the runtime PM status of the device is "active",
> genpd_runtime_suspend() is called for it (indirectly) and (1) runs
> the ->runtime_suspend callback provided by the device's driver
> (assuming no bus type with ->runtime_suspend of its own), (2) "stops"
> the device and (3) checks if the domain can be powered down, and then
> (4) the device's runtime PM status is changed to "suspended".  Out of
> the four actions above (1) is not necessary and it may be outright
> harmful, (3) is pointless and (4) is questionable.  The only
> operation that needs to be carried out here is (2).
>
> The reason why (1) is not necessary is because the system-wide
> PM callbacks provided by the device driver for the transition in
> question have been run and they should have taken care of the
> driver's part of device suspend already.  Moreover, it may be
> harmful, because the ->runtime_suspend callback may want to
> access the device which is partially suspended at that point
> and may not be responsive.  Also, system-wide PM callbacks may
> have been run already (in the previous phases of the system
> transition under way) for the device's parent or for its supplier
> devices (if any) and the device may not be accessible because of
> that.
>
> There also is no reason to do (3), because genpd_finish_suspend()
> will repeat it anyway, and (4) potentially causes confusion to ensue
> during the subsequent system transition to the working state.
>
> Consider pm_runtime_force_resume() in genpd_resume_noirq() now.
> It runs genpd_runtime_resume() for all devices with runtime PM
> status set to "suspended", which includes all of the devices
> whose runtime PM status was changed by pm_runtime_force_suspend()
> before and may include some devices already suspended when the
> pm_runtime_force_suspend() was running, which may be confusing.  The
> genpd_runtime_resume() first tries to power up the domain, which
> (again) is pointless, because genpd_resume_noirq() has done that
> already.  Then, it "starts" the device and runs the ->runtime_resume
> callback (from the driver, say) for it.  If all is well, the device
> is left with the runtime PM status set to "active".
>
> Unfortunately, running the driver's ->runtime_resume callback
> before its system-wide PM callbacks and possibly before some
> system-wide PM callbacks of the parent device's driver (let
> alone supplier drivers) is asking for trouble, especially if
> the device had been suspended before pm_runtime_force_suspend()
> ran previously or if the callbacks in question expect to be run
> back-to-back with their suspend-side counterparts.  It also should
> not be necessary, because the system-wide PM driver callbacks that
> will be invoked for the device subsequently should take care of
> resuming it just fine.
>
> [Running the driver's ->runtime_resume callback in the "noirq"
> phase of the transition to the working state may be problematic
> even for devices whose drivers do use pm_runtime_force_resume()
> in (or as) their system-wide PM callbacks if they have suppliers
> other than their parents, because it may cause the supplier to
> be resumed after the consumer in some cases.]
>
> Because of the above, modify genpd as follows:
>
>  1. Change genpd_finish_suspend() to only "stop" devices with
>     runtime PM status set to "active" (without invoking runtime PM
>     callbacks for them, changing their runtime PM status and so on).
>
>     That doesn't change the handling of devices whose drivers use
>     pm_runtime_force_suspend/resume() in (or as) their system-wide
>     PM callbacks and addresses the issues described above for the
>     other devices.
>
>  2. Change genpd_resume_noirq() to only "start" devices with
>     runtime PM status set to "active" (without invoking runtime PM
>     callbacks for them, changing their runtime PM status and so on).
>
>     Again, that doesn't change the handling of devices whose drivers
>     use pm_runtime_force_suspend/resume() in (or as) their system-wide
>     PM callbacks and addresses the described issues for the other
>     devices.  Devices with runtime PM status set to "suspended"
>     are not started with the assumption that they will be resumed
>     later, either by pm_runtime_force_resume() or via runtime PM.
>
>  3. Change genpd_restore_noirq() to follow genpd_resume_noirq().
>
>     That causes devices already suspended before hibernation to be
>     left alone (which also is the case without the change) and
>     avoids running the ->runtime_resume driver callback too early
>     for the other devices.
>
>  4. Change genpd_freeze_noirq() and genpd_thaw_noirq() in accordance
>     with the above modifications.
>
> Fixes: 122a22377a3d (PM / Domains: Stop/start devices during system PM suspend/resume in genpd)
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

That was an extensive changlog, thanks for the details and for working on this!

Acked-by: Ulf Hansson <ulf.hansson@linaro.org>

Kind regards
Uffe

> ---
>  drivers/base/power/domain.c |   25 +++++++++++++++----------
>  1 file changed, 15 insertions(+), 10 deletions(-)
>
> Index: linux-pm/drivers/base/power/domain.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/domain.c
> +++ linux-pm/drivers/base/power/domain.c
> @@ -1048,8 +1048,9 @@ static int genpd_finish_suspend(struct d
>         if (dev->power.wakeup_path && genpd_is_active_wakeup(genpd))
>                 return 0;
>
> -       if (genpd->dev_ops.stop && genpd->dev_ops.start) {
> -               ret = pm_runtime_force_suspend(dev);
> +       if (genpd->dev_ops.stop && genpd->dev_ops.start &&
> +           !pm_runtime_status_suspended(dev)) {
> +               ret = genpd_stop_dev(genpd, dev);
>                 if (ret) {
>                         if (poweroff)
>                                 pm_generic_restore_noirq(dev);
> @@ -1106,8 +1107,9 @@ static int genpd_resume_noirq(struct dev
>         genpd->suspended_count--;
>         genpd_unlock(genpd);
>
> -       if (genpd->dev_ops.stop && genpd->dev_ops.start) {
> -               ret = pm_runtime_force_resume(dev);
> +       if (genpd->dev_ops.stop && genpd->dev_ops.start &&
> +           !pm_runtime_status_suspended(dev)) {
> +               ret = genpd_start_dev(genpd, dev);
>                 if (ret)
>                         return ret;
>         }
> @@ -1139,8 +1141,9 @@ static int genpd_freeze_noirq(struct dev
>         if (ret)
>                 return ret;
>
> -       if (genpd->dev_ops.stop && genpd->dev_ops.start)
> -               ret = pm_runtime_force_suspend(dev);
> +       if (genpd->dev_ops.stop && genpd->dev_ops.start &&
> +           !pm_runtime_status_suspended(dev))
> +               ret = genpd_stop_dev(genpd, dev);
>
>         return ret;
>  }
> @@ -1163,8 +1166,9 @@ static int genpd_thaw_noirq(struct devic
>         if (IS_ERR(genpd))
>                 return -EINVAL;
>
> -       if (genpd->dev_ops.stop && genpd->dev_ops.start) {
> -               ret = pm_runtime_force_resume(dev);
> +       if (genpd->dev_ops.stop && genpd->dev_ops.start &&
> +           !pm_runtime_status_suspended(dev)) {
> +               ret = genpd_start_dev(genpd, dev);
>                 if (ret)
>                         return ret;
>         }
> @@ -1221,8 +1225,9 @@ static int genpd_restore_noirq(struct de
>         genpd_sync_power_on(genpd, true, 0);
>         genpd_unlock(genpd);
>
> -       if (genpd->dev_ops.stop && genpd->dev_ops.start) {
> -               ret = pm_runtime_force_resume(dev);
> +       if (genpd->dev_ops.stop && genpd->dev_ops.start &&
> +           !pm_runtime_status_suspended(dev)) {
> +               ret = genpd_start_dev(genpd, dev);
>                 if (ret)
>                         return ret;
>         }
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] PM / core: genpd fix and pm_runtime_force_suspend|resume() rework
  2018-01-12 13:00 [PATCH 0/2] PM / core: genpd fix and pm_runtime_force_suspend|resume() rework Rafael J. Wysocki
  2018-01-12 13:10 ` [PATCH 1/2] PM / genpd: Stop/start devices without pm_runtime_force_suspend/resume() Rafael J. Wysocki
  2018-01-12 13:12 ` [PATCH 2/2] PM / runtime: Rework pm_runtime_force_suspend/resume() Rafael J. Wysocki
@ 2018-01-12 14:31 ` Geert Uytterhoeven
  2018-01-13  0:38   ` Rafael J. Wysocki
  2 siblings, 1 reply; 16+ messages in thread
From: Geert Uytterhoeven @ 2018-01-12 14:31 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PM, Ulf Hansson, LKML, Linux-Renesas

Hi Rafael,

On Fri, Jan 12, 2018 at 2:00 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> This comes from the recent discussion/testing effort that ensued after my
> pm_runtime_force_suspend|resume() changes proposal:
>
> https://marc.info/?t=151497772000004&r=1&w=2
>
> Patch [1/2] basically is https://patchwork.kernel.org/patch/10152873/ rebased
> on top of the current linux-next branch of the linux-pm.git tree (the relevant
> part should be there in the linux-next tree proper ATM).  It applies on top
> of https://patchwork.kernel.org/patch/10156077/ which should apply to the Linus'
> tree cleanly.
>
> Patch [2/2] is a resend of https://patchwork.kernel.org/patch/10142047/ with
> a very minor changelog modification and the R-b tag from Ulf.
>
> Geert, if possible, please test this on the Renesas systems that had the
> problem with https://patchwork.kernel.org/patch/10142047/ previously and let
> me know if you still see issues.

I've tested this on two very similar systems: Salvator-XS with R-Car H3 ES2.0,
and Salvator-X with R-Car M3-W ES1.0.

On the M3-based system, everything seems to work fine.
On the H3-based system, the serial console (the /dev/ttySC0 device, not kernel
serial output) is dead after resume from s2ram, with and without
no_console_suspend.

With no_console_suspend, I see:

    ttySC ttySC0: 1 input overrun(s)

after typing on the serial console, so it looks like an interrupt problem.

This issue seems to be caused by patch [1/2]. But I have no idea what's
really happening, and why the two systems behave differently.

Oh well, have a nice weekend!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] PM / core: genpd fix and pm_runtime_force_suspend|resume() rework
  2018-01-12 14:31 ` [PATCH 0/2] PM / core: genpd fix and pm_runtime_force_suspend|resume() rework Geert Uytterhoeven
@ 2018-01-13  0:38   ` Rafael J. Wysocki
  2018-01-14  9:48     ` Geert Uytterhoeven
  0 siblings, 1 reply; 16+ messages in thread
From: Rafael J. Wysocki @ 2018-01-13  0:38 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: Linux PM, Ulf Hansson, LKML, Linux-Renesas

On Friday, January 12, 2018 3:31:09 PM CET Geert Uytterhoeven wrote:
> Hi Rafael,
> 
> On Fri, Jan 12, 2018 at 2:00 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > This comes from the recent discussion/testing effort that ensued after my
> > pm_runtime_force_suspend|resume() changes proposal:
> >
> > https://marc.info/?t=151497772000004&r=1&w=2
> >
> > Patch [1/2] basically is https://patchwork.kernel.org/patch/10152873/ rebased
> > on top of the current linux-next branch of the linux-pm.git tree (the relevant
> > part should be there in the linux-next tree proper ATM).  It applies on top
> > of https://patchwork.kernel.org/patch/10156077/ which should apply to the Linus'
> > tree cleanly.
> >
> > Patch [2/2] is a resend of https://patchwork.kernel.org/patch/10142047/ with
> > a very minor changelog modification and the R-b tag from Ulf.
> >
> > Geert, if possible, please test this on the Renesas systems that had the
> > problem with https://patchwork.kernel.org/patch/10142047/ previously and let
> > me know if you still see issues.
> 
> I've tested this on two very similar systems: Salvator-XS with R-Car H3 ES2.0,
> and Salvator-X with R-Car M3-W ES1.0.
> 
> On the M3-based system, everything seems to work fine.

Good.

> On the H3-based system, the serial console (the /dev/ttySC0 device, not kernel
> serial output) is dead after resume from s2ram, with and without
> no_console_suspend.
> 
> With no_console_suspend, I see:
> 
>     ttySC ttySC0: 1 input overrun(s)
> 
> after typing on the serial console, so it looks like an interrupt problem.
> 
> This issue seems to be caused by patch [1/2]. But I have no idea what's
> really happening, and why the two systems behave differently.

Well, that's not dramatic.

Let's make a deal that we'll fix this on top of [1/2].

Which driver is this BTW?  sh-sci?  That one doesn't even support runtime
PM, confusingly enough.

> Oh well, have a nice weekend!

Thanks, you too!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] PM / runtime: Rework pm_runtime_force_suspend/resume()
  2018-01-12 13:59   ` Ulf Hansson
@ 2018-01-13  0:41     ` Rafael J. Wysocki
  0 siblings, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2018-01-13  0:41 UTC (permalink / raw)
  To: Ulf Hansson; +Cc: Linux PM, Geert Uytterhoeven, LKML, Lukas Wunner

On Friday, January 12, 2018 2:59:38 PM CET Ulf Hansson wrote:
> On 12 January 2018 at 14:12, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > One of the limitations of pm_runtime_force_suspend/resume() is that
> > if a parent driver wants to use these functions, all of its child
> > drivers generally have to do that too because of the parent usage
> > counter manipulations necessary to get the correct state of the parent
> > during system-wide transitions to the working state (system resume).
> > However, that limitation turns out to be artificial, so remove it.
> 
> According to my comment on the other thread, this stands true in case
> the child is managed by runtime PM as well.

What do you mean by "managed by runtime PM"?

> Otherwise this looks good to me.
> 
> >
> > Namely, pm_runtime_force_suspend() only needs to update the children
> > counter of its parent (if there's is a parent) when the device can
> > stay in suspend after the subsequent system resume transition, as
> > that counter is correct already otherwise.  Now, if the parent's
> > children counter is not updated, it is not necessary to increment
> > the parent's usage counter in that case any more, as long as the
> > children counters of devices are checked along with their usage
> > counters in order to decide whether or not the devices may be left
> > in suspend after the subsequent system resume transition.
> >
> > Accordingly, modify pm_runtime_force_suspend() to only call
> > pm_runtime_set_suspended() for devices whose usage and children
> > counters are at the "no references" level (the runtime PM status
> > of the device needs to be updated to "suspended" anyway in case
> > this function is called once again for the same device during the
> > transition under way), drop the parent usage counter incrementation
> > from it and update pm_runtime_force_resume() to compensate for these
> > changes.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
> > ---
> >  drivers/base/power/runtime.c |   74 +++++++++++++++++++------------------------
> >  1 file changed, 34 insertions(+), 40 deletions(-)
> >
> > Index: linux-pm/drivers/base/power/runtime.c
> > ===================================================================
> > --- linux-pm.orig/drivers/base/power/runtime.c
> > +++ linux-pm/drivers/base/power/runtime.c
> > @@ -1613,17 +1613,28 @@ void pm_runtime_drop_link(struct device
> >         spin_unlock_irq(&dev->power.lock);
> >  }
> >
> > +static bool pm_runtime_need_not_resume(struct device *dev)
> > +{
> > +       return atomic_read(&dev->power.usage_count) <= 1 &&
> > +               atomic_read(&dev->power.child_count) == 0;
> 
> How about adding an additional patch on top taking into account the
> ignore_children flag and folding that into the series, kind of as you
> also suggested?

I will do that, no worries.

> My point is, we might as well take the opportunity to fix this right
> away, don't you think?

OK, I'll send a patch on top of this series.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] PM / core: genpd fix and pm_runtime_force_suspend|resume() rework
  2018-01-13  0:38   ` Rafael J. Wysocki
@ 2018-01-14  9:48     ` Geert Uytterhoeven
  2018-01-15  0:04       ` Rafael J. Wysocki
  0 siblings, 1 reply; 16+ messages in thread
From: Geert Uytterhoeven @ 2018-01-14  9:48 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PM, Ulf Hansson, LKML, Linux-Renesas

Hi Rafael,

On Sat, Jan 13, 2018 at 1:38 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Friday, January 12, 2018 3:31:09 PM CET Geert Uytterhoeven wrote:
>> On Fri, Jan 12, 2018 at 2:00 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> > This comes from the recent discussion/testing effort that ensued after my
>> > pm_runtime_force_suspend|resume() changes proposal:
>> >
>> > https://marc.info/?t=151497772000004&r=1&w=2
>> >
>> > Patch [1/2] basically is https://patchwork.kernel.org/patch/10152873/ rebased
>> > on top of the current linux-next branch of the linux-pm.git tree (the relevant
>> > part should be there in the linux-next tree proper ATM).  It applies on top
>> > of https://patchwork.kernel.org/patch/10156077/ which should apply to the Linus'
>> > tree cleanly.
>> >
>> > Patch [2/2] is a resend of https://patchwork.kernel.org/patch/10142047/ with
>> > a very minor changelog modification and the R-b tag from Ulf.
>> >
>> > Geert, if possible, please test this on the Renesas systems that had the
>> > problem with https://patchwork.kernel.org/patch/10142047/ previously and let
>> > me know if you still see issues.
>>
>> I've tested this on two very similar systems: Salvator-XS with R-Car H3 ES2.0,
>> and Salvator-X with R-Car M3-W ES1.0.
>>
>> On the M3-based system, everything seems to work fine.
>
> Good.
>
>> On the H3-based system, the serial console (the /dev/ttySC0 device, not kernel
>> serial output) is dead after resume from s2ram, with and without
>> no_console_suspend.
>>
>> With no_console_suspend, I see:
>>
>>     ttySC ttySC0: 1 input overrun(s)
>>
>> after typing on the serial console, so it looks like an interrupt problem.
>>
>> This issue seems to be caused by patch [1/2]. But I have no idea what's
>> really happening, and why the two systems behave differently.

Could be a firmware issue, too.
While the kernel images are identical, the ARM trusted firmware configs aren't
(same version, though).

I'll do some more investigation...

> Well, that's not dramatic.
>
> Let's make a deal that we'll fix this on top of [1/2].

;-)

> Which driver is this BTW?  sh-sci?  That one doesn't even support runtime
> PM, confusingly enough.

Yes, sh-sci. It does make pm_runtime_*() calls.
And of course there's uart_ops.pm, which is driven from serial_core...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] PM / core: genpd fix and pm_runtime_force_suspend|resume() rework
  2018-01-14  9:48     ` Geert Uytterhoeven
@ 2018-01-15  0:04       ` Rafael J. Wysocki
  2018-01-15  8:16         ` Geert Uytterhoeven
  0 siblings, 1 reply; 16+ messages in thread
From: Rafael J. Wysocki @ 2018-01-15  0:04 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Rafael J. Wysocki, Linux PM, Ulf Hansson, LKML, Linux-Renesas

On Sun, Jan 14, 2018 at 10:48 AM, Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
> Hi Rafael,
>
> On Sat, Jan 13, 2018 at 1:38 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> On Friday, January 12, 2018 3:31:09 PM CET Geert Uytterhoeven wrote:
>>> On Fri, Jan 12, 2018 at 2:00 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>> > This comes from the recent discussion/testing effort that ensued after my
>>> > pm_runtime_force_suspend|resume() changes proposal:
>>> >
>>> > https://marc.info/?t=151497772000004&r=1&w=2
>>> >
>>> > Patch [1/2] basically is https://patchwork.kernel.org/patch/10152873/ rebased
>>> > on top of the current linux-next branch of the linux-pm.git tree (the relevant
>>> > part should be there in the linux-next tree proper ATM).  It applies on top
>>> > of https://patchwork.kernel.org/patch/10156077/ which should apply to the Linus'
>>> > tree cleanly.
>>> >
>>> > Patch [2/2] is a resend of https://patchwork.kernel.org/patch/10142047/ with
>>> > a very minor changelog modification and the R-b tag from Ulf.
>>> >
>>> > Geert, if possible, please test this on the Renesas systems that had the
>>> > problem with https://patchwork.kernel.org/patch/10142047/ previously and let
>>> > me know if you still see issues.
>>>
>>> I've tested this on two very similar systems: Salvator-XS with R-Car H3 ES2.0,
>>> and Salvator-X with R-Car M3-W ES1.0.
>>>
>>> On the M3-based system, everything seems to work fine.
>>
>> Good.
>>
>>> On the H3-based system, the serial console (the /dev/ttySC0 device, not kernel
>>> serial output) is dead after resume from s2ram, with and without
>>> no_console_suspend.
>>>
>>> With no_console_suspend, I see:
>>>
>>>     ttySC ttySC0: 1 input overrun(s)
>>>
>>> after typing on the serial console, so it looks like an interrupt problem.
>>>
>>> This issue seems to be caused by patch [1/2]. But I have no idea what's
>>> really happening, and why the two systems behave differently.
>
> Could be a firmware issue, too.
> While the kernel images are identical, the ARM trusted firmware configs aren't
> (same version, though).
>
> I'll do some more investigation...

OK, thanks!

It also would be good to know the topology of the device hierarchy and
how that maps to the domains on the failing system (and which UART
clocks are operated by genpd).

>> Well, that's not dramatic.
>>
>> Let's make a deal that we'll fix this on top of [1/2].
>
> ;-)
>
>> Which driver is this BTW?  sh-sci?  That one doesn't even support runtime
>> PM, confusingly enough.
>
> Yes, sh-sci. It does make pm_runtime_*() calls.

Hmm.  I overlooked that part.

This is sort of unusual, because the driver doesn't provide any
runtime PM callbacks, but still it does provided system suspend ones.
It looks like the idea is to never put it into runtime suspend if any
ports are enabled and always put it into runtime suspend otherwise.

Which one is the case in your testing?  Is the port disabled or
enabled during system-wide suspend?

> And of course there's uart_ops.pm, which is driven from serial_core...

What does this point to for that particular device?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] PM / core: genpd fix and pm_runtime_force_suspend|resume() rework
  2018-01-15  0:04       ` Rafael J. Wysocki
@ 2018-01-15  8:16         ` Geert Uytterhoeven
  2018-01-15 13:22           ` Geert Uytterhoeven
  0 siblings, 1 reply; 16+ messages in thread
From: Geert Uytterhoeven @ 2018-01-15  8:16 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linux PM, Ulf Hansson, LKML, Linux-Renesas

Hi Rafael,

On Mon, Jan 15, 2018 at 1:04 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Sun, Jan 14, 2018 at 10:48 AM, Geert Uytterhoeven
> <geert@linux-m68k.org> wrote:
>> On Sat, Jan 13, 2018 at 1:38 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>> On Friday, January 12, 2018 3:31:09 PM CET Geert Uytterhoeven wrote:
>>>> On Fri, Jan 12, 2018 at 2:00 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>>> > This comes from the recent discussion/testing effort that ensued after my
>>>> > pm_runtime_force_suspend|resume() changes proposal:
>>>> >
>>>> > https://marc.info/?t=151497772000004&r=1&w=2
>>>> >
>>>> > Patch [1/2] basically is https://patchwork.kernel.org/patch/10152873/ rebased
>>>> > on top of the current linux-next branch of the linux-pm.git tree (the relevant
>>>> > part should be there in the linux-next tree proper ATM).  It applies on top
>>>> > of https://patchwork.kernel.org/patch/10156077/ which should apply to the Linus'
>>>> > tree cleanly.
>>>> >
>>>> > Patch [2/2] is a resend of https://patchwork.kernel.org/patch/10142047/ with
>>>> > a very minor changelog modification and the R-b tag from Ulf.
>>>> >
>>>> > Geert, if possible, please test this on the Renesas systems that had the
>>>> > problem with https://patchwork.kernel.org/patch/10142047/ previously and let
>>>> > me know if you still see issues.
>>>>
>>>> I've tested this on two very similar systems: Salvator-XS with R-Car H3 ES2.0,
>>>> and Salvator-X with R-Car M3-W ES1.0.
>>>>
>>>> On the M3-based system, everything seems to work fine.
>>>
>>> Good.
>>>
>>>> On the H3-based system, the serial console (the /dev/ttySC0 device, not kernel
>>>> serial output) is dead after resume from s2ram, with and without
>>>> no_console_suspend.
>>>>
>>>> With no_console_suspend, I see:
>>>>
>>>>     ttySC ttySC0: 1 input overrun(s)
>>>>
>>>> after typing on the serial console, so it looks like an interrupt problem.
>>>>
>>>> This issue seems to be caused by patch [1/2]. But I have no idea what's
>>>> really happening, and why the two systems behave differently.
>>
>> Could be a firmware issue, too.
>> While the kernel images are identical, the ARM trusted firmware configs aren't
>> (same version, though).
>>
>> I'll do some more investigation...
>
> OK, thanks!
>
> It also would be good to know the topology of the device hierarchy and
> how that maps to the domains on the failing system (and which UART
> clocks are operated by genpd).

The topology is the same on both systems.
The UART's module clock is operated by genpd, on both systems.

>>> Well, that's not dramatic.
>>>
>>> Let's make a deal that we'll fix this on top of [1/2].
>>
>> ;-)
>>
>>> Which driver is this BTW?  sh-sci?  That one doesn't even support runtime
>>> PM, confusingly enough.
>>
>> Yes, sh-sci. It does make pm_runtime_*() calls.
>
> Hmm.  I overlooked that part.
>
> This is sort of unusual, because the driver doesn't provide any
> runtime PM callbacks, but still it does provided system suspend ones.
> It looks like the idea is to never put it into runtime suspend if any
> ports are enabled and always put it into runtime suspend otherwise.
>
> Which one is the case in your testing?  Is the port disabled or
> enabled during system-wide suspend?

It's enabled on both systems, as a getty is running.

>> And of course there's uart_ops.pm, which is driven from serial_core...
>
> What does this point to for that particular device?

sci_pm(), on both systems.

See, there's no difference in topology on both systems, so I'll have to look
a bit deeper first...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] PM / core: genpd fix and pm_runtime_force_suspend|resume() rework
  2018-01-15  8:16         ` Geert Uytterhoeven
@ 2018-01-15 13:22           ` Geert Uytterhoeven
  2018-01-15 14:26             ` Ulf Hansson
  0 siblings, 1 reply; 16+ messages in thread
From: Geert Uytterhoeven @ 2018-01-15 13:22 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linux PM, Ulf Hansson, LKML, Linux-Renesas

Hi Rafael,

On Mon, Jan 15, 2018 at 9:16 AM, Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
> On Mon, Jan 15, 2018 at 1:04 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>> On Sun, Jan 14, 2018 at 10:48 AM, Geert Uytterhoeven
>> <geert@linux-m68k.org> wrote:
>>> On Sat, Jan 13, 2018 at 1:38 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>>> On Friday, January 12, 2018 3:31:09 PM CET Geert Uytterhoeven wrote:
>>>>> On Fri, Jan 12, 2018 at 2:00 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>>>> > This comes from the recent discussion/testing effort that ensued after my
>>>>> > pm_runtime_force_suspend|resume() changes proposal:
>>>>> >
>>>>> > https://marc.info/?t=151497772000004&r=1&w=2
>>>>> >
>>>>> > Patch [1/2] basically is https://patchwork.kernel.org/patch/10152873/ rebased
>>>>> > on top of the current linux-next branch of the linux-pm.git tree (the relevant
>>>>> > part should be there in the linux-next tree proper ATM).  It applies on top
>>>>> > of https://patchwork.kernel.org/patch/10156077/ which should apply to the Linus'
>>>>> > tree cleanly.
>>>>> >
>>>>> > Patch [2/2] is a resend of https://patchwork.kernel.org/patch/10142047/ with
>>>>> > a very minor changelog modification and the R-b tag from Ulf.
>>>>> >
>>>>> > Geert, if possible, please test this on the Renesas systems that had the
>>>>> > problem with https://patchwork.kernel.org/patch/10142047/ previously and let
>>>>> > me know if you still see issues.
>>>>>
>>>>> I've tested this on two very similar systems: Salvator-XS with R-Car H3 ES2.0,
>>>>> and Salvator-X with R-Car M3-W ES1.0.
>>>>>
>>>>> On the M3-based system, everything seems to work fine.
>>>>
>>>> Good.
>>>>
>>>>> On the H3-based system, the serial console (the /dev/ttySC0 device, not kernel
>>>>> serial output) is dead after resume from s2ram, with and without
>>>>> no_console_suspend.
>>>>>
>>>>> With no_console_suspend, I see:
>>>>>
>>>>>     ttySC ttySC0: 1 input overrun(s)
>>>>>
>>>>> after typing on the serial console, so it looks like an interrupt problem.
>>>>>
>>>>> This issue seems to be caused by patch [1/2]. But I have no idea what's
>>>>> really happening, and why the two systems behave differently.
>>>
>>> Could be a firmware issue, too.
>>> While the kernel images are identical, the ARM trusted firmware configs aren't
>>> (same version, though).
>>>
>>> I'll do some more investigation...
>>
>> OK, thanks!
>>
>> It also would be good to know the topology of the device hierarchy and
>> how that maps to the domains on the failing system (and which UART
>> clocks are operated by genpd).
>
> The topology is the same on both systems.

I did miss a small difference in topology: in pm/linux-next, H3 has DMA
enabled for SCIF2, while M3 hasn't (yet).
With DMA enabled on M3, it fails in the same way.

As genpd_resume_noirq() no longer calls pm_runtime_force_resume(),
rcar_dmac_runtime_resume() is no longer called, and the DMAC's registers
are no longer reinitialized after system resume, breaking the serial port.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] PM / core: genpd fix and pm_runtime_force_suspend|resume() rework
  2018-01-15 13:22           ` Geert Uytterhoeven
@ 2018-01-15 14:26             ` Ulf Hansson
  2018-01-15 16:17               ` Rafael J. Wysocki
  0 siblings, 1 reply; 16+ messages in thread
From: Ulf Hansson @ 2018-01-15 14:26 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Linux PM, LKML, Linux-Renesas

On 15 January 2018 at 14:22, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> Hi Rafael,
>
> On Mon, Jan 15, 2018 at 9:16 AM, Geert Uytterhoeven
> <geert@linux-m68k.org> wrote:
>> On Mon, Jan 15, 2018 at 1:04 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>>> On Sun, Jan 14, 2018 at 10:48 AM, Geert Uytterhoeven
>>> <geert@linux-m68k.org> wrote:
>>>> On Sat, Jan 13, 2018 at 1:38 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>>>> On Friday, January 12, 2018 3:31:09 PM CET Geert Uytterhoeven wrote:
>>>>>> On Fri, Jan 12, 2018 at 2:00 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>>>>> > This comes from the recent discussion/testing effort that ensued after my
>>>>>> > pm_runtime_force_suspend|resume() changes proposal:
>>>>>> >
>>>>>> > https://marc.info/?t=151497772000004&r=1&w=2
>>>>>> >
>>>>>> > Patch [1/2] basically is https://patchwork.kernel.org/patch/10152873/ rebased
>>>>>> > on top of the current linux-next branch of the linux-pm.git tree (the relevant
>>>>>> > part should be there in the linux-next tree proper ATM).  It applies on top
>>>>>> > of https://patchwork.kernel.org/patch/10156077/ which should apply to the Linus'
>>>>>> > tree cleanly.
>>>>>> >
>>>>>> > Patch [2/2] is a resend of https://patchwork.kernel.org/patch/10142047/ with
>>>>>> > a very minor changelog modification and the R-b tag from Ulf.
>>>>>> >
>>>>>> > Geert, if possible, please test this on the Renesas systems that had the
>>>>>> > problem with https://patchwork.kernel.org/patch/10142047/ previously and let
>>>>>> > me know if you still see issues.
>>>>>>
>>>>>> I've tested this on two very similar systems: Salvator-XS with R-Car H3 ES2.0,
>>>>>> and Salvator-X with R-Car M3-W ES1.0.
>>>>>>
>>>>>> On the M3-based system, everything seems to work fine.
>>>>>
>>>>> Good.
>>>>>
>>>>>> On the H3-based system, the serial console (the /dev/ttySC0 device, not kernel
>>>>>> serial output) is dead after resume from s2ram, with and without
>>>>>> no_console_suspend.
>>>>>>
>>>>>> With no_console_suspend, I see:
>>>>>>
>>>>>>     ttySC ttySC0: 1 input overrun(s)
>>>>>>
>>>>>> after typing on the serial console, so it looks like an interrupt problem.
>>>>>>
>>>>>> This issue seems to be caused by patch [1/2]. But I have no idea what's
>>>>>> really happening, and why the two systems behave differently.
>>>>
>>>> Could be a firmware issue, too.
>>>> While the kernel images are identical, the ARM trusted firmware configs aren't
>>>> (same version, though).
>>>>
>>>> I'll do some more investigation...
>>>
>>> OK, thanks!
>>>
>>> It also would be good to know the topology of the device hierarchy and
>>> how that maps to the domains on the failing system (and which UART
>>> clocks are operated by genpd).
>>
>> The topology is the same on both systems.
>
> I did miss a small difference in topology: in pm/linux-next, H3 has DMA
> enabled for SCIF2, while M3 hasn't (yet).
> With DMA enabled on M3, it fails in the same way.
>
> As genpd_resume_noirq() no longer calls pm_runtime_force_resume(),
> rcar_dmac_runtime_resume() is no longer called, and the DMAC's registers
> are no longer reinitialized after system resume, breaking the serial port.

In drivers/dma/sh/rcar-dmac.c, I would try to replace the below line:
SET_SYSTEM_SLEEP_PM_OPS(rcar_dmac_sleep_suspend, rcar_dmac_sleep_resume)

with:
SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, pm_runtime_force_resume)

in case that may be too early to suspend the dma device (which is
rather common for dma devices) then try;

SET_LATE_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, pm_runtime_force_resume)

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] PM / core: genpd fix and pm_runtime_force_suspend|resume() rework
  2018-01-15 14:26             ` Ulf Hansson
@ 2018-01-15 16:17               ` Rafael J. Wysocki
  2018-01-17  9:14                 ` Geert Uytterhoeven
  0 siblings, 1 reply; 16+ messages in thread
From: Rafael J. Wysocki @ 2018-01-15 16:17 UTC (permalink / raw)
  To: Ulf Hansson, Geert Uytterhoeven
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Linux PM, LKML, Linux-Renesas

On Mon, Jan 15, 2018 at 3:26 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On 15 January 2018 at 14:22, Geert Uytterhoeven <geert@linux-m68k.org> wrote:

[cut]

>>
>> I did miss a small difference in topology: in pm/linux-next, H3 has DMA
>> enabled for SCIF2, while M3 hasn't (yet).
>> With DMA enabled on M3, it fails in the same way.
>>
>> As genpd_resume_noirq() no longer calls pm_runtime_force_resume(),
>> rcar_dmac_runtime_resume() is no longer called, and the DMAC's registers
>> are no longer reinitialized after system resume, breaking the serial port.
>
> In drivers/dma/sh/rcar-dmac.c, I would try to replace the below line:
> SET_SYSTEM_SLEEP_PM_OPS(rcar_dmac_sleep_suspend, rcar_dmac_sleep_resume)
>
> with:
> SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, pm_runtime_force_resume)

Yes, that probably is the least intrusive thing that can be done to
address the issue.

> in case that may be too early to suspend the dma device (which is
> rather common for dma devices) then try;
>
> SET_LATE_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, pm_runtime_force_resume)

Good suggestion, and I would go straight for it anyway.

Geert, can you try if this works, please?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] PM / core: genpd fix and pm_runtime_force_suspend|resume() rework
  2018-01-15 16:17               ` Rafael J. Wysocki
@ 2018-01-17  9:14                 ` Geert Uytterhoeven
  2018-01-17 11:11                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 16+ messages in thread
From: Geert Uytterhoeven @ 2018-01-17  9:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Ulf Hansson, Rafael J. Wysocki, Linux PM, LKML, Linux-Renesas

Hi Rafael,

On Mon, Jan 15, 2018 at 5:17 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Mon, Jan 15, 2018 at 3:26 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>> On 15 January 2018 at 14:22, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> [cut]
>
>>>
>>> I did miss a small difference in topology: in pm/linux-next, H3 has DMA
>>> enabled for SCIF2, while M3 hasn't (yet).
>>> With DMA enabled on M3, it fails in the same way.
>>>
>>> As genpd_resume_noirq() no longer calls pm_runtime_force_resume(),
>>> rcar_dmac_runtime_resume() is no longer called, and the DMAC's registers
>>> are no longer reinitialized after system resume, breaking the serial port.
>>
>> In drivers/dma/sh/rcar-dmac.c, I would try to replace the below line:
>> SET_SYSTEM_SLEEP_PM_OPS(rcar_dmac_sleep_suspend, rcar_dmac_sleep_resume)
>>
>> with:
>> SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, pm_runtime_force_resume)
>
> Yes, that probably is the least intrusive thing that can be done to
> address the issue.
>
>> in case that may be too early to suspend the dma device (which is
>> rather common for dma devices) then try;
>>
>> SET_LATE_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, pm_runtime_force_resume)
>
> Good suggestion, and I would go straight for it anyway.
>
> Geert, can you try if this works, please?

Works. Both using SET_SYSTEM_SLEEP_PM_OPS() and
SET_LATE_SYSTEM_SLEEP_PM_OPS(). But given this is a DMA engine
driver, I'd settle for the latter.

And I did verify doing so doesn't break the system without the patch
in $subject.

Thanks!

Will send a patch...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] PM / core: genpd fix and pm_runtime_force_suspend|resume() rework
  2018-01-17  9:14                 ` Geert Uytterhoeven
@ 2018-01-17 11:11                   ` Rafael J. Wysocki
  0 siblings, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2018-01-17 11:11 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Rafael J. Wysocki, Ulf Hansson, Linux PM, LKML, Linux-Renesas

On Wednesday, January 17, 2018 10:14:23 AM CET Geert Uytterhoeven wrote:
> Hi Rafael,
> 
> On Mon, Jan 15, 2018 at 5:17 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > On Mon, Jan 15, 2018 at 3:26 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >> On 15 January 2018 at 14:22, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> >
> > [cut]
> >
> >>>
> >>> I did miss a small difference in topology: in pm/linux-next, H3 has DMA
> >>> enabled for SCIF2, while M3 hasn't (yet).
> >>> With DMA enabled on M3, it fails in the same way.
> >>>
> >>> As genpd_resume_noirq() no longer calls pm_runtime_force_resume(),
> >>> rcar_dmac_runtime_resume() is no longer called, and the DMAC's registers
> >>> are no longer reinitialized after system resume, breaking the serial port.
> >>
> >> In drivers/dma/sh/rcar-dmac.c, I would try to replace the below line:
> >> SET_SYSTEM_SLEEP_PM_OPS(rcar_dmac_sleep_suspend, rcar_dmac_sleep_resume)
> >>
> >> with:
> >> SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, pm_runtime_force_resume)
> >
> > Yes, that probably is the least intrusive thing that can be done to
> > address the issue.
> >
> >> in case that may be too early to suspend the dma device (which is
> >> rather common for dma devices) then try;
> >>
> >> SET_LATE_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, pm_runtime_force_resume)
> >
> > Good suggestion, and I would go straight for it anyway.
> >
> > Geert, can you try if this works, please?
> 
> Works. Both using SET_SYSTEM_SLEEP_PM_OPS() and
> SET_LATE_SYSTEM_SLEEP_PM_OPS(). But given this is a DMA engine
> driver, I'd settle for the latter.
> 
> And I did verify doing so doesn't break the system without the patch
> in $subject.
> 
> Thanks!
> 
> Will send a patch...

Thank you!

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-01-17 11:13 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-12 13:00 [PATCH 0/2] PM / core: genpd fix and pm_runtime_force_suspend|resume() rework Rafael J. Wysocki
2018-01-12 13:10 ` [PATCH 1/2] PM / genpd: Stop/start devices without pm_runtime_force_suspend/resume() Rafael J. Wysocki
2018-01-12 14:07   ` Ulf Hansson
2018-01-12 13:12 ` [PATCH 2/2] PM / runtime: Rework pm_runtime_force_suspend/resume() Rafael J. Wysocki
2018-01-12 13:59   ` Ulf Hansson
2018-01-13  0:41     ` Rafael J. Wysocki
2018-01-12 14:31 ` [PATCH 0/2] PM / core: genpd fix and pm_runtime_force_suspend|resume() rework Geert Uytterhoeven
2018-01-13  0:38   ` Rafael J. Wysocki
2018-01-14  9:48     ` Geert Uytterhoeven
2018-01-15  0:04       ` Rafael J. Wysocki
2018-01-15  8:16         ` Geert Uytterhoeven
2018-01-15 13:22           ` Geert Uytterhoeven
2018-01-15 14:26             ` Ulf Hansson
2018-01-15 16:17               ` Rafael J. Wysocki
2018-01-17  9:14                 ` Geert Uytterhoeven
2018-01-17 11:11                   ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).