All of lore.kernel.org
 help / color / mirror / Atom feed
* [RESEND PATCH 1/2] PM / sleep: print function name of callbacks
@ 2016-10-20  0:26 ` Brian Norris
  0 siblings, 0 replies; 25+ messages in thread
From: Brian Norris @ 2016-10-20  0:26 UTC (permalink / raw)
  To: Rafael J . Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman
  Cc: linux-kernel, Doug Anderson, Brian Norris, Jeffy Chen, linux-pm,
	Chuansheng Liu, Dmitry Torokhov, Brian Norris

From: Douglas Anderson <dianders@chromium.org>

The printouts writen to the logs by suspend can be a bit opaque: it can
be hard to track them down to the actual function called.  You might
see:

  calling  rfkill1+ @ 19473, parent: phy0
  call rfkill1+ returned 0 after 1 usecs
  calling  phy0+ @ 19473, parent: mmc2:0001:1
  call phy0+ returned 0 after 19 usecs

It's a bit hard to know what's actually happening.  Instead, it's nice
to see:

  calling  rfkill1+ @ 15793, parent: phy0, cb: rfkill_suspend
  call rfkill1+ returned 0 after 1 usecs
  calling  phy0+ @ 15793, parent: mmc2:0001:1, cb: wiphy_suspend [cfg80211]
  call phy0+ returned 0 after 7 usecs

That makes it very obvious what's going on.  It also has the nice side
effect of making the suspend/resume spew a little more obvious, since
many resume functions have the word "resume" in the name:

  calling  phy0+ @ 15793, parent: mmc2:0001:1, cb: wiphy_resume [cfg80211]
  call phy0+ returned 0 after 12 usecs
  calling  rfkill1+ @ 15793, parent: phy0, cb: rfkill_resume
  call rfkill1+ returned 0 after 1 usecs

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Brian Norris <briannorris@chromium.org>
---
This is a resend of the following patch (w/ minor $subject alteration):

https://patchwork.kernel.org/patch/7241641/

It was useful for debugging the following patch, so I thought I'd resend. It
received an Ack the first time.

 drivers/base/power/main.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index e44944f4be77..c58563581345 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -190,14 +190,14 @@ void device_pm_move_last(struct device *dev)
 	list_move_tail(&dev->power.entry, &dpm_list);
 }
 
-static ktime_t initcall_debug_start(struct device *dev)
+static ktime_t initcall_debug_start(struct device *dev, void *cb)
 {
 	ktime_t calltime = ktime_set(0, 0);
 
 	if (pm_print_times_enabled) {
-		pr_info("calling  %s+ @ %i, parent: %s\n",
+		pr_info("calling  %s+ @ %i, parent: %s, cb: %pf\n",
 			dev_name(dev), task_pid_nr(current),
-			dev->parent ? dev_name(dev->parent) : "none");
+			dev->parent ? dev_name(dev->parent) : "none", cb);
 		calltime = ktime_get();
 	}
 
@@ -384,7 +384,7 @@ static int dpm_run_callback(pm_callback_t cb, struct device *dev,
 	if (!cb)
 		return 0;
 
-	calltime = initcall_debug_start(dev);
+	calltime = initcall_debug_start(dev, cb);
 
 	pm_dev_dbg(dev, state, info);
 	trace_device_pm_callback_start(dev, info, state.event);
@@ -1330,7 +1330,7 @@ static int legacy_suspend(struct device *dev, pm_message_t state,
 	int error;
 	ktime_t calltime;
 
-	calltime = initcall_debug_start(dev);
+	calltime = initcall_debug_start(dev, cb);
 
 	trace_device_pm_callback_start(dev, info, state.event);
 	error = cb(dev, state);
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RESEND PATCH 1/2] PM / sleep: print function name of callbacks
@ 2016-10-20  0:26 ` Brian Norris
  0 siblings, 0 replies; 25+ messages in thread
From: Brian Norris @ 2016-10-20  0:26 UTC (permalink / raw)
  To: Rafael J . Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman
  Cc: linux-kernel, Doug Anderson, Brian Norris, Jeffy Chen, linux-pm,
	Chuansheng Liu, Dmitry Torokhov, Brian Norris

From: Douglas Anderson <dianders@chromium.org>

The printouts writen to the logs by suspend can be a bit opaque: it can
be hard to track them down to the actual function called.  You might
see:

  calling  rfkill1+ @ 19473, parent: phy0
  call rfkill1+ returned 0 after 1 usecs
  calling  phy0+ @ 19473, parent: mmc2:0001:1
  call phy0+ returned 0 after 19 usecs

It's a bit hard to know what's actually happening.  Instead, it's nice
to see:

  calling  rfkill1+ @ 15793, parent: phy0, cb: rfkill_suspend
  call rfkill1+ returned 0 after 1 usecs
  calling  phy0+ @ 15793, parent: mmc2:0001:1, cb: wiphy_suspend [cfg80211]
  call phy0+ returned 0 after 7 usecs

That makes it very obvious what's going on.  It also has the nice side
effect of making the suspend/resume spew a little more obvious, since
many resume functions have the word "resume" in the name:

  calling  phy0+ @ 15793, parent: mmc2:0001:1, cb: wiphy_resume [cfg80211]
  call phy0+ returned 0 after 12 usecs
  calling  rfkill1+ @ 15793, parent: phy0, cb: rfkill_resume
  call rfkill1+ returned 0 after 1 usecs

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Brian Norris <briannorris@chromium.org>
---
This is a resend of the following patch (w/ minor $subject alteration):

https://patchwork.kernel.org/patch/7241641/

It was useful for debugging the following patch, so I thought I'd resend. It
received an Ack the first time.

 drivers/base/power/main.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index e44944f4be77..c58563581345 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -190,14 +190,14 @@ void device_pm_move_last(struct device *dev)
 	list_move_tail(&dev->power.entry, &dpm_list);
 }
 
-static ktime_t initcall_debug_start(struct device *dev)
+static ktime_t initcall_debug_start(struct device *dev, void *cb)
 {
 	ktime_t calltime = ktime_set(0, 0);
 
 	if (pm_print_times_enabled) {
-		pr_info("calling  %s+ @ %i, parent: %s\n",
+		pr_info("calling  %s+ @ %i, parent: %s, cb: %pf\n",
 			dev_name(dev), task_pid_nr(current),
-			dev->parent ? dev_name(dev->parent) : "none");
+			dev->parent ? dev_name(dev->parent) : "none", cb);
 		calltime = ktime_get();
 	}
 
@@ -384,7 +384,7 @@ static int dpm_run_callback(pm_callback_t cb, struct device *dev,
 	if (!cb)
 		return 0;
 
-	calltime = initcall_debug_start(dev);
+	calltime = initcall_debug_start(dev, cb);
 
 	pm_dev_dbg(dev, state, info);
 	trace_device_pm_callback_start(dev, info, state.event);
@@ -1330,7 +1330,7 @@ static int legacy_suspend(struct device *dev, pm_message_t state,
 	int error;
 	ktime_t calltime;
 
-	calltime = initcall_debug_start(dev);
+	calltime = initcall_debug_start(dev, cb);
 
 	trace_device_pm_callback_start(dev, info, state.event);
 	error = cb(dev, state);
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/2] PM / sleep: don't suspend parent when async child suspend_{noirq,early} fails
  2016-10-20  0:26 ` Brian Norris
@ 2016-10-20  0:26   ` Brian Norris
  -1 siblings, 0 replies; 25+ messages in thread
From: Brian Norris @ 2016-10-20  0:26 UTC (permalink / raw)
  To: Rafael J . Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman
  Cc: linux-kernel, Doug Anderson, Brian Norris, Jeffy Chen, linux-pm,
	Chuansheng Liu, Dmitry Torokhov, Brian Norris

Consider two devices, A and B, where B is a child of A, and B utilizes
asynchronous suspend (it does not matter whether A is sync or async). If
B fails to suspend_noirq() or suspend_early(), or is interrupted by a
wakeup (pm_wakeup_pending()), then it aborts and sets the async_error
variable. However, device A does not (immediately) check the async_error
variable; it may continue to run its own suspend_noirq()/suspend_early()
callback. This is bad.

We can resolve this problem by checking the async_error flag after
waiting for children to suspend, using the same logic for the noirq and
late suspend cases as we already do for __device_suspend().

It's easy to observe this erroneous behavior by, for example, forcing a
device to sleep a bit in its suspend_noirq() (to ensure the parent is
waiting for the child to complete), then return an error, and watch the
parent suspend_noirq() still get called. (Or similarly, fake a wakeup
event at the right (or is it wrong?) time.)

Fixes: de377b397272 ("PM / sleep: Asynchronous threads for suspend_late")
Fixes: 28b6fd6e3779 ("PM / sleep: Asynchronous threads for suspend_noirq")
Reported-by: Jeffy Chen <jeffy.chen@rock-chips.com>
Signed-off-by: Brian Norris <briannorris@chromium.org>
---
Tested mostly on 4.4, but the code (and seemingly the bug) is mostly untouched
since then. I can try to conjure up a proper upstream test if required.

 drivers/base/power/main.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index c58563581345..eaf6b53463a5 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1040,6 +1040,9 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a
 
 	dpm_wait_for_children(dev, async);
 
+	if (async_error)
+		goto Complete;
+
 	if (dev->pm_domain) {
 		info = "noirq power domain ";
 		callback = pm_noirq_op(&dev->pm_domain->ops, state);
@@ -1187,6 +1190,9 @@ static int __device_suspend_late(struct device *dev, pm_message_t state, bool as
 
 	dpm_wait_for_children(dev, async);
 
+	if (async_error)
+		goto Complete;
+
 	if (dev->pm_domain) {
 		info = "late power domain ";
 		callback = pm_late_early_op(&dev->pm_domain->ops, state);
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/2] PM / sleep: don't suspend parent when async child suspend_{noirq,early} fails
@ 2016-10-20  0:26   ` Brian Norris
  0 siblings, 0 replies; 25+ messages in thread
From: Brian Norris @ 2016-10-20  0:26 UTC (permalink / raw)
  To: Rafael J . Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman
  Cc: linux-kernel, Doug Anderson, Brian Norris, Jeffy Chen, linux-pm,
	Chuansheng Liu, Dmitry Torokhov, Brian Norris

Consider two devices, A and B, where B is a child of A, and B utilizes
asynchronous suspend (it does not matter whether A is sync or async). If
B fails to suspend_noirq() or suspend_early(), or is interrupted by a
wakeup (pm_wakeup_pending()), then it aborts and sets the async_error
variable. However, device A does not (immediately) check the async_error
variable; it may continue to run its own suspend_noirq()/suspend_early()
callback. This is bad.

We can resolve this problem by checking the async_error flag after
waiting for children to suspend, using the same logic for the noirq and
late suspend cases as we already do for __device_suspend().

It's easy to observe this erroneous behavior by, for example, forcing a
device to sleep a bit in its suspend_noirq() (to ensure the parent is
waiting for the child to complete), then return an error, and watch the
parent suspend_noirq() still get called. (Or similarly, fake a wakeup
event at the right (or is it wrong?) time.)

Fixes: de377b397272 ("PM / sleep: Asynchronous threads for suspend_late")
Fixes: 28b6fd6e3779 ("PM / sleep: Asynchronous threads for suspend_noirq")
Reported-by: Jeffy Chen <jeffy.chen@rock-chips.com>
Signed-off-by: Brian Norris <briannorris@chromium.org>
---
Tested mostly on 4.4, but the code (and seemingly the bug) is mostly untouched
since then. I can try to conjure up a proper upstream test if required.

 drivers/base/power/main.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index c58563581345..eaf6b53463a5 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1040,6 +1040,9 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a
 
 	dpm_wait_for_children(dev, async);
 
+	if (async_error)
+		goto Complete;
+
 	if (dev->pm_domain) {
 		info = "noirq power domain ";
 		callback = pm_noirq_op(&dev->pm_domain->ops, state);
@@ -1187,6 +1190,9 @@ static int __device_suspend_late(struct device *dev, pm_message_t state, bool as
 
 	dpm_wait_for_children(dev, async);
 
+	if (async_error)
+		goto Complete;
+
 	if (dev->pm_domain) {
 		info = "late power domain ";
 		callback = pm_late_early_op(&dev->pm_domain->ops, state);
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] PM / sleep: don't suspend parent when async child suspend_{noirq,early} fails
  2016-10-20  0:26   ` Brian Norris
  (?)
@ 2016-10-20  0:46   ` Brian Norris
  2016-10-27 15:34     ` Greg Kroah-Hartman
  -1 siblings, 1 reply; 25+ messages in thread
From: Brian Norris @ 2016-10-20  0:46 UTC (permalink / raw)
  To: Rafael J . Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman
  Cc: linux-kernel, Doug Anderson, Brian Norris, Jeffy Chen, linux-pm,
	Chuansheng Liu, Dmitry Torokhov

Ugh, as I hope the patch context makes clear, the subject should be

s/early/late/

as should the body of the commit message.

On Wed, Oct 19, 2016 at 05:26:10PM -0700, Brian Norris wrote:
> Consider two devices, A and B, where B is a child of A, and B utilizes
> asynchronous suspend (it does not matter whether A is sync or async). If
> B fails to suspend_noirq() or suspend_early(), or is interrupted by a

s/early/late/

> wakeup (pm_wakeup_pending()), then it aborts and sets the async_error
> variable. However, device A does not (immediately) check the async_error
> variable; it may continue to run its own suspend_noirq()/suspend_early()

s/early/late/

> callback. This is bad.
> 
> We can resolve this problem by checking the async_error flag after
> waiting for children to suspend, using the same logic for the noirq and
> late suspend cases as we already do for __device_suspend().
> 
> It's easy to observe this erroneous behavior by, for example, forcing a
> device to sleep a bit in its suspend_noirq() (to ensure the parent is
> waiting for the child to complete), then return an error, and watch the
> parent suspend_noirq() still get called. (Or similarly, fake a wakeup
> event at the right (or is it wrong?) time.)
> 
> Fixes: de377b397272 ("PM / sleep: Asynchronous threads for suspend_late")
> Fixes: 28b6fd6e3779 ("PM / sleep: Asynchronous threads for suspend_noirq")
> Reported-by: Jeffy Chen <jeffy.chen@rock-chips.com>
> Signed-off-by: Brian Norris <briannorris@chromium.org>

If the patch is otherwise acceptable, feel free to make the above edits.
Or I can fix them up if I send v2.

Sorry for the noise,
Brian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RESEND PATCH 1/2] PM / sleep: print function name of callbacks
  2016-10-20  0:26 ` Brian Norris
  (?)
  (?)
@ 2016-10-20  0:52 ` Dmitry Torokhov
  -1 siblings, 0 replies; 25+ messages in thread
From: Dmitry Torokhov @ 2016-10-20  0:52 UTC (permalink / raw)
  To: Brian Norris
  Cc: Rafael J . Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	linux-kernel, Doug Anderson, Brian Norris, Jeffy Chen, linux-pm,
	Chuansheng Liu

On Wed, Oct 19, 2016 at 05:26:09PM -0700, Brian Norris wrote:
> From: Douglas Anderson <dianders@chromium.org>
> 
> The printouts writen to the logs by suspend can be a bit opaque: it can
> be hard to track them down to the actual function called.  You might
> see:
> 
>   calling  rfkill1+ @ 19473, parent: phy0
>   call rfkill1+ returned 0 after 1 usecs
>   calling  phy0+ @ 19473, parent: mmc2:0001:1
>   call phy0+ returned 0 after 19 usecs
> 
> It's a bit hard to know what's actually happening.  Instead, it's nice
> to see:
> 
>   calling  rfkill1+ @ 15793, parent: phy0, cb: rfkill_suspend
>   call rfkill1+ returned 0 after 1 usecs
>   calling  phy0+ @ 15793, parent: mmc2:0001:1, cb: wiphy_suspend [cfg80211]
>   call phy0+ returned 0 after 7 usecs
> 
> That makes it very obvious what's going on.  It also has the nice side
> effect of making the suspend/resume spew a little more obvious, since
> many resume functions have the word "resume" in the name:
> 
>   calling  phy0+ @ 15793, parent: mmc2:0001:1, cb: wiphy_resume [cfg80211]
>   call phy0+ returned 0 after 12 usecs
>   calling  rfkill1+ @ 15793, parent: phy0, cb: rfkill_resume
>   call rfkill1+ returned 0 after 1 usecs
> 
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> Acked-by: Pavel Machek <pavel@ucw.cz>
> Signed-off-by: Brian Norris <briannorris@chromium.org>

Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

> ---
> This is a resend of the following patch (w/ minor $subject alteration):
> 
> https://patchwork.kernel.org/patch/7241641/
> 
> It was useful for debugging the following patch, so I thought I'd resend. It
> received an Ack the first time.
> 
>  drivers/base/power/main.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index e44944f4be77..c58563581345 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -190,14 +190,14 @@ void device_pm_move_last(struct device *dev)
>  	list_move_tail(&dev->power.entry, &dpm_list);
>  }
>  
> -static ktime_t initcall_debug_start(struct device *dev)
> +static ktime_t initcall_debug_start(struct device *dev, void *cb)
>  {
>  	ktime_t calltime = ktime_set(0, 0);
>  
>  	if (pm_print_times_enabled) {
> -		pr_info("calling  %s+ @ %i, parent: %s\n",
> +		pr_info("calling  %s+ @ %i, parent: %s, cb: %pf\n",
>  			dev_name(dev), task_pid_nr(current),
> -			dev->parent ? dev_name(dev->parent) : "none");
> +			dev->parent ? dev_name(dev->parent) : "none", cb);
>  		calltime = ktime_get();
>  	}
>  
> @@ -384,7 +384,7 @@ static int dpm_run_callback(pm_callback_t cb, struct device *dev,
>  	if (!cb)
>  		return 0;
>  
> -	calltime = initcall_debug_start(dev);
> +	calltime = initcall_debug_start(dev, cb);
>  
>  	pm_dev_dbg(dev, state, info);
>  	trace_device_pm_callback_start(dev, info, state.event);
> @@ -1330,7 +1330,7 @@ static int legacy_suspend(struct device *dev, pm_message_t state,
>  	int error;
>  	ktime_t calltime;
>  
> -	calltime = initcall_debug_start(dev);
> +	calltime = initcall_debug_start(dev, cb);
>  
>  	trace_device_pm_callback_start(dev, info, state.event);
>  	error = cb(dev, state);
> -- 
> 2.8.0.rc3.226.g39d4020
> 

-- 
Dmitry

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] PM / sleep: don't suspend parent when async child suspend_{noirq,early} fails
  2016-10-20  0:26   ` Brian Norris
  (?)
  (?)
@ 2016-10-20  0:56   ` Dmitry Torokhov
  -1 siblings, 0 replies; 25+ messages in thread
From: Dmitry Torokhov @ 2016-10-20  0:56 UTC (permalink / raw)
  To: Brian Norris
  Cc: Rafael J . Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	linux-kernel, Doug Anderson, Brian Norris, Jeffy Chen, linux-pm,
	Chuansheng Liu

On Wed, Oct 19, 2016 at 05:26:10PM -0700, Brian Norris wrote:
> Consider two devices, A and B, where B is a child of A, and B utilizes
> asynchronous suspend (it does not matter whether A is sync or async). If
> B fails to suspend_noirq() or suspend_early(), or is interrupted by a
> wakeup (pm_wakeup_pending()), then it aborts and sets the async_error
> variable. However, device A does not (immediately) check the async_error
> variable; it may continue to run its own suspend_noirq()/suspend_early()
> callback. This is bad.
> 
> We can resolve this problem by checking the async_error flag after
> waiting for children to suspend, using the same logic for the noirq and
> late suspend cases as we already do for __device_suspend().
> 
> It's easy to observe this erroneous behavior by, for example, forcing a
> device to sleep a bit in its suspend_noirq() (to ensure the parent is
> waiting for the child to complete), then return an error, and watch the
> parent suspend_noirq() still get called. (Or similarly, fake a wakeup
> event at the right (or is it wrong?) time.)
> 
> Fixes: de377b397272 ("PM / sleep: Asynchronous threads for suspend_late")
> Fixes: 28b6fd6e3779 ("PM / sleep: Asynchronous threads for suspend_noirq")
> Reported-by: Jeffy Chen <jeffy.chen@rock-chips.com>
> Signed-off-by: Brian Norris <briannorris@chromium.org>

Makes sense to me.

Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

> ---
> Tested mostly on 4.4, but the code (and seemingly the bug) is mostly untouched
> since then. I can try to conjure up a proper upstream test if required.
> 
>  drivers/base/power/main.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index c58563581345..eaf6b53463a5 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -1040,6 +1040,9 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a
>  
>  	dpm_wait_for_children(dev, async);
>  
> +	if (async_error)
> +		goto Complete;
> +
>  	if (dev->pm_domain) {
>  		info = "noirq power domain ";
>  		callback = pm_noirq_op(&dev->pm_domain->ops, state);
> @@ -1187,6 +1190,9 @@ static int __device_suspend_late(struct device *dev, pm_message_t state, bool as
>  
>  	dpm_wait_for_children(dev, async);
>  
> +	if (async_error)
> +		goto Complete;
> +
>  	if (dev->pm_domain) {
>  		info = "late power domain ";
>  		callback = pm_late_early_op(&dev->pm_domain->ops, state);
> -- 
> 2.8.0.rc3.226.g39d4020
> 

-- 
Dmitry

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] PM / sleep: don't suspend parent when async child suspend_{noirq,early} fails
  2016-10-20  0:46   ` Brian Norris
@ 2016-10-27 15:34     ` Greg Kroah-Hartman
  2016-10-27 16:03       ` Brian Norris
  0 siblings, 1 reply; 25+ messages in thread
From: Greg Kroah-Hartman @ 2016-10-27 15:34 UTC (permalink / raw)
  To: Brian Norris
  Cc: Rafael J . Wysocki, Pavel Machek, Len Brown, linux-kernel,
	Doug Anderson, Brian Norris, Jeffy Chen, linux-pm,
	Chuansheng Liu, Dmitry Torokhov

On Wed, Oct 19, 2016 at 05:46:11PM -0700, Brian Norris wrote:
> Ugh, as I hope the patch context makes clear, the subject should be
> 
> s/early/late/
> 
> as should the body of the commit message.
> 
> On Wed, Oct 19, 2016 at 05:26:10PM -0700, Brian Norris wrote:
> > Consider two devices, A and B, where B is a child of A, and B utilizes
> > asynchronous suspend (it does not matter whether A is sync or async). If
> > B fails to suspend_noirq() or suspend_early(), or is interrupted by a
> 
> s/early/late/
> 
> > wakeup (pm_wakeup_pending()), then it aborts and sets the async_error
> > variable. However, device A does not (immediately) check the async_error
> > variable; it may continue to run its own suspend_noirq()/suspend_early()
> 
> s/early/late/
> 
> > callback. This is bad.
> > 
> > We can resolve this problem by checking the async_error flag after
> > waiting for children to suspend, using the same logic for the noirq and
> > late suspend cases as we already do for __device_suspend().
> > 
> > It's easy to observe this erroneous behavior by, for example, forcing a
> > device to sleep a bit in its suspend_noirq() (to ensure the parent is
> > waiting for the child to complete), then return an error, and watch the
> > parent suspend_noirq() still get called. (Or similarly, fake a wakeup
> > event at the right (or is it wrong?) time.)
> > 
> > Fixes: de377b397272 ("PM / sleep: Asynchronous threads for suspend_late")
> > Fixes: 28b6fd6e3779 ("PM / sleep: Asynchronous threads for suspend_noirq")
> > Reported-by: Jeffy Chen <jeffy.chen@rock-chips.com>
> > Signed-off-by: Brian Norris <briannorris@chromium.org>
> 
> If the patch is otherwise acceptable, feel free to make the above edits.
> Or I can fix them up if I send v2.

Please fix up, we should never have to hand-edit a changelog text...

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] PM / sleep: don't suspend parent when async child suspend_{noirq,early} fails
  2016-10-27 15:34     ` Greg Kroah-Hartman
@ 2016-10-27 16:03       ` Brian Norris
  0 siblings, 0 replies; 25+ messages in thread
From: Brian Norris @ 2016-10-27 16:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Rafael J . Wysocki, Pavel Machek, Len Brown, linux-kernel,
	Doug Anderson, Brian Norris, Jeffy Chen, linux-pm,
	Chuansheng Liu, Dmitry Torokhov

On Thu, Oct 27, 2016 at 05:34:06PM +0200, Greg Kroah-Hartman wrote:
> On Wed, Oct 19, 2016 at 05:46:11PM -0700, Brian Norris wrote:
> > Ugh, as I hope the patch context makes clear, the subject should be
> > 
> > s/early/late/
> > 
> > as should the body of the commit message.
[...]
> > If the patch is otherwise acceptable, feel free to make the above edits.
> > Or I can fix them up if I send v2.
> 
> Please fix up, we should never have to hand-edit a changelog text...

Sorry, I figured it was possible to get review without spamming
everyone. If I was going to need a v2 anyway, for example, then why
bother for a trivial change.

I'll send v2 (of just this patch) shortly.

Thanks,
Brian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 2/2] PM / sleep: don't suspend parent when async child suspend_{noirq,late} fails
  2016-10-20  0:26   ` Brian Norris
@ 2016-10-27 16:05     ` Brian Norris
  -1 siblings, 0 replies; 25+ messages in thread
From: Brian Norris @ 2016-10-27 16:05 UTC (permalink / raw)
  To: Rafael J . Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman
  Cc: linux-kernel, Doug Anderson, Brian Norris, Jeffy Chen, linux-pm,
	Chuansheng Liu, Dmitry Torokhov, Brian Norris

Consider two devices, A and B, where B is a child of A, and B utilizes
asynchronous suspend (it does not matter whether A is sync or async). If
B fails to suspend_noirq() or suspend_late(), or is interrupted by a
wakeup (pm_wakeup_pending()), then it aborts and sets the async_error
variable. However, device A does not (immediately) check the async_error
variable; it may continue to run its own suspend_noirq()/suspend_late()
callback. This is bad.

We can resolve this problem by checking the async_error flag after
waiting for children to suspend, using the same logic for the noirq and
late suspend cases as we already do for __device_suspend().

It's easy to observe this erroneous behavior by, for example, forcing a
device to sleep a bit in its suspend_noirq() (to ensure the parent is
waiting for the child to complete), then return an error, and watch the
parent suspend_noirq() still get called. (Or similarly, fake a wakeup
event at the right (or is it wrong?) time.)

Fixes: de377b397272 ("PM / sleep: Asynchronous threads for suspend_late")
Fixes: 28b6fd6e3779 ("PM / sleep: Asynchronous threads for suspend_noirq")
Reported-by: Jeffy Chen <jeffy.chen@rock-chips.com>
Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
---
v2: s/early/late/ in commit message

 drivers/base/power/main.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index c58563581345..eaf6b53463a5 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1040,6 +1040,9 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a
 
 	dpm_wait_for_children(dev, async);
 
+	if (async_error)
+		goto Complete;
+
 	if (dev->pm_domain) {
 		info = "noirq power domain ";
 		callback = pm_noirq_op(&dev->pm_domain->ops, state);
@@ -1187,6 +1190,9 @@ static int __device_suspend_late(struct device *dev, pm_message_t state, bool as
 
 	dpm_wait_for_children(dev, async);
 
+	if (async_error)
+		goto Complete;
+
 	if (dev->pm_domain) {
 		info = "late power domain ";
 		callback = pm_late_early_op(&dev->pm_domain->ops, state);
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 2/2] PM / sleep: don't suspend parent when async child suspend_{noirq,late} fails
@ 2016-10-27 16:05     ` Brian Norris
  0 siblings, 0 replies; 25+ messages in thread
From: Brian Norris @ 2016-10-27 16:05 UTC (permalink / raw)
  To: Rafael J . Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman
  Cc: linux-kernel, Doug Anderson, Brian Norris, Jeffy Chen, linux-pm,
	Chuansheng Liu, Dmitry Torokhov, Brian Norris

Consider two devices, A and B, where B is a child of A, and B utilizes
asynchronous suspend (it does not matter whether A is sync or async). If
B fails to suspend_noirq() or suspend_late(), or is interrupted by a
wakeup (pm_wakeup_pending()), then it aborts and sets the async_error
variable. However, device A does not (immediately) check the async_error
variable; it may continue to run its own suspend_noirq()/suspend_late()
callback. This is bad.

We can resolve this problem by checking the async_error flag after
waiting for children to suspend, using the same logic for the noirq and
late suspend cases as we already do for __device_suspend().

It's easy to observe this erroneous behavior by, for example, forcing a
device to sleep a bit in its suspend_noirq() (to ensure the parent is
waiting for the child to complete), then return an error, and watch the
parent suspend_noirq() still get called. (Or similarly, fake a wakeup
event at the right (or is it wrong?) time.)

Fixes: de377b397272 ("PM / sleep: Asynchronous threads for suspend_late")
Fixes: 28b6fd6e3779 ("PM / sleep: Asynchronous threads for suspend_noirq")
Reported-by: Jeffy Chen <jeffy.chen@rock-chips.com>
Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
---
v2: s/early/late/ in commit message

 drivers/base/power/main.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index c58563581345..eaf6b53463a5 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1040,6 +1040,9 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a
 
 	dpm_wait_for_children(dev, async);
 
+	if (async_error)
+		goto Complete;
+
 	if (dev->pm_domain) {
 		info = "noirq power domain ";
 		callback = pm_noirq_op(&dev->pm_domain->ops, state);
@@ -1187,6 +1190,9 @@ static int __device_suspend_late(struct device *dev, pm_message_t state, bool as
 
 	dpm_wait_for_children(dev, async);
 
+	if (async_error)
+		goto Complete;
+
 	if (dev->pm_domain) {
 		info = "late power domain ";
 		callback = pm_late_early_op(&dev->pm_domain->ops, state);
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] PM / sleep: don't suspend parent when async child suspend_{noirq,late} fails
  2016-10-27 16:05     ` Brian Norris
  (?)
@ 2016-11-01  4:25     ` Rafael J. Wysocki
  2016-11-01  5:22       ` Brian Norris
  2016-11-01  6:04       ` Dmitry Torokhov
  -1 siblings, 2 replies; 25+ messages in thread
From: Rafael J. Wysocki @ 2016-11-01  4:25 UTC (permalink / raw)
  To: Brian Norris
  Cc: Pavel Machek, Len Brown, Greg Kroah-Hartman, linux-kernel,
	Doug Anderson, Brian Norris, Jeffy Chen, linux-pm,
	Chuansheng Liu, Dmitry Torokhov

On Thursday, October 27, 2016 09:05:34 AM Brian Norris wrote:
> Consider two devices, A and B, where B is a child of A, and B utilizes
> asynchronous suspend (it does not matter whether A is sync or async). If
> B fails to suspend_noirq() or suspend_late(), or is interrupted by a
> wakeup (pm_wakeup_pending()), then it aborts and sets the async_error
> variable. However, device A does not (immediately) check the async_error
> variable; it may continue to run its own suspend_noirq()/suspend_late()
> callback. This is bad.
> 
> We can resolve this problem by checking the async_error flag after
> waiting for children to suspend, using the same logic for the noirq and
> late suspend cases as we already do for __device_suspend().
> 
> It's easy to observe this erroneous behavior by, for example, forcing a
> device to sleep a bit in its suspend_noirq() (to ensure the parent is
> waiting for the child to complete), then return an error, and watch the
> parent suspend_noirq() still get called. (Or similarly, fake a wakeup
> event at the right (or is it wrong?) time.)
> 
> Fixes: de377b397272 ("PM / sleep: Asynchronous threads for suspend_late")
> Fixes: 28b6fd6e3779 ("PM / sleep: Asynchronous threads for suspend_noirq")
> Reported-by: Jeffy Chen <jeffy.chen@rock-chips.com>
> Signed-off-by: Brian Norris <briannorris@chromium.org>
> Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
> ---
> v2: s/early/late/ in commit message
> 
>  drivers/base/power/main.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index c58563581345..eaf6b53463a5 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -1040,6 +1040,9 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a
>  
>  	dpm_wait_for_children(dev, async);
>  
> +	if (async_error)
> +		goto Complete;
> +

This is a second chech for async_error in this routine and is the first one
really needed after adding this?

>  	if (dev->pm_domain) {
>  		info = "noirq power domain ";
>  		callback = pm_noirq_op(&dev->pm_domain->ops, state);
> @@ -1187,6 +1190,9 @@ static int __device_suspend_late(struct device *dev, pm_message_t state, bool as
>  
>  	dpm_wait_for_children(dev, async);
>  
> +	if (async_error)
> +		goto Complete;
> +

Same question.

>  	if (dev->pm_domain) {
>  		info = "late power domain ";
>  		callback = pm_late_early_op(&dev->pm_domain->ops, state);
> 

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RESEND PATCH 1/2] PM / sleep: print function name of callbacks
  2016-10-20  0:26 ` Brian Norris
                   ` (2 preceding siblings ...)
  (?)
@ 2016-11-01  4:27 ` Rafael J. Wysocki
  2016-11-02 21:02   ` Brian Norris
  -1 siblings, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2016-11-01  4:27 UTC (permalink / raw)
  To: Brian Norris
  Cc: Pavel Machek, Len Brown, Greg Kroah-Hartman, linux-kernel,
	Doug Anderson, Brian Norris, Jeffy Chen, linux-pm,
	Chuansheng Liu, Dmitry Torokhov

On Wednesday, October 19, 2016 05:26:09 PM Brian Norris wrote:
> From: Douglas Anderson <dianders@chromium.org>
> 
> The printouts writen to the logs by suspend can be a bit opaque: it can
> be hard to track them down to the actual function called.  You might
> see:
> 
>   calling  rfkill1+ @ 19473, parent: phy0
>   call rfkill1+ returned 0 after 1 usecs
>   calling  phy0+ @ 19473, parent: mmc2:0001:1
>   call phy0+ returned 0 after 19 usecs
> 
> It's a bit hard to know what's actually happening.  Instead, it's nice
> to see:
> 
>   calling  rfkill1+ @ 15793, parent: phy0, cb: rfkill_suspend
>   call rfkill1+ returned 0 after 1 usecs
>   calling  phy0+ @ 15793, parent: mmc2:0001:1, cb: wiphy_suspend [cfg80211]
>   call phy0+ returned 0 after 7 usecs
> 
> That makes it very obvious what's going on.  It also has the nice side
> effect of making the suspend/resume spew a little more obvious, since
> many resume functions have the word "resume" in the name:
> 
>   calling  phy0+ @ 15793, parent: mmc2:0001:1, cb: wiphy_resume [cfg80211]
>   call phy0+ returned 0 after 12 usecs
>   calling  rfkill1+ @ 15793, parent: phy0, cb: rfkill_resume
>   call rfkill1+ returned 0 after 1 usecs
> 
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> Acked-by: Pavel Machek <pavel@ucw.cz>
> Signed-off-by: Brian Norris <briannorris@chromium.org>

Any reason why you need to rely on the initcall_debug stuff instead of using
the tracepoints we have there (for exactly the reason why you are pushing this
patch)?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] PM / sleep: don't suspend parent when async child suspend_{noirq,late} fails
  2016-11-01  4:25     ` Rafael J. Wysocki
@ 2016-11-01  5:22       ` Brian Norris
  2016-11-01  6:04       ` Dmitry Torokhov
  1 sibling, 0 replies; 25+ messages in thread
From: Brian Norris @ 2016-11-01  5:22 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pavel Machek, Len Brown, Greg Kroah-Hartman, linux-kernel,
	Doug Anderson, Brian Norris, Jeffy Chen, linux-pm,
	Chuansheng Liu, Dmitry Torokhov

Hi Rafael,

On Tue, Nov 01, 2016 at 05:25:39AM +0100, Rafael J. Wysocki wrote:
> On Thursday, October 27, 2016 09:05:34 AM Brian Norris wrote:
> > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > index c58563581345..eaf6b53463a5 100644
> > --- a/drivers/base/power/main.c
> > +++ b/drivers/base/power/main.c
> > @@ -1040,6 +1040,9 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a
> >  
> >  	dpm_wait_for_children(dev, async);
> >  
> > +	if (async_error)
> > +		goto Complete;
> > +
> 
> This is a second chech for async_error in this routine and is the first one
> really needed after adding this?

Maybe not? I confess I'm not 100% sure on all the reasons for the code
structure as-is, but it looks like we're trying to catch pending wakeups
early, and because that procedure utilizes 'async_error' to stash the
-EBUSY, it seemingly makes sense to check if it's non-zero before
overwriting it.

But then, that's all kind of racy, since there can be multiple writers
to that variable, no? So it can't matter *that* much if we clobber the
error, as long as we abort somewhere.

Anyway, maybe it's best if dpm_wait_for_children() just moves to be
first thing in this function (after the tracepoints). That seems just as
correct to me, and shouldn't waste any additional time suspending
devices for a failed system suspend attempt -- as long as we're still
catching wakeups before we suspend the current device. (That also
incidentally matches the structure of __device_suspend() more closely.
Why did this all get out of sync (pun unintended) when copied from the
suspend() to the suspend_{late,noirq}() phase?)

All in all, the short response is that I wrote the smallest patch that
fixes the bug, AFAICT. But actually I think the above would be both
shorter and better. I'll give that a go.

> >  	if (dev->pm_domain) {
> >  		info = "noirq power domain ";
> >  		callback = pm_noirq_op(&dev->pm_domain->ops, state);
> > @@ -1187,6 +1190,9 @@ static int __device_suspend_late(struct device *dev, pm_message_t state, bool as
> >  
> >  	dpm_wait_for_children(dev, async);
> >  
> > +	if (async_error)
> > +		goto Complete;
> > +
> 
> Same question.

Same answer :)

Brian

> >  	if (dev->pm_domain) {
> >  		info = "late power domain ";
> >  		callback = pm_late_early_op(&dev->pm_domain->ops, state);
> > 
> 
> Thanks,
> Rafael
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] PM / sleep: don't suspend parent when async child suspend_{noirq,late} fails
  2016-11-01  4:25     ` Rafael J. Wysocki
  2016-11-01  5:22       ` Brian Norris
@ 2016-11-01  6:04       ` Dmitry Torokhov
  2016-11-02  3:51         ` Rafael J. Wysocki
  1 sibling, 1 reply; 25+ messages in thread
From: Dmitry Torokhov @ 2016-11-01  6:04 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Brian Norris, Pavel Machek, Len Brown, Greg Kroah-Hartman, lkml,
	Doug Anderson, Brian Norris, Jeffy Chen, linux-pm,
	Chuansheng Liu

On Mon, Oct 31, 2016 at 10:25 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Thursday, October 27, 2016 09:05:34 AM Brian Norris wrote:
>> Consider two devices, A and B, where B is a child of A, and B utilizes
>> asynchronous suspend (it does not matter whether A is sync or async). If
>> B fails to suspend_noirq() or suspend_late(), or is interrupted by a
>> wakeup (pm_wakeup_pending()), then it aborts and sets the async_error
>> variable. However, device A does not (immediately) check the async_error
>> variable; it may continue to run its own suspend_noirq()/suspend_late()
>> callback. This is bad.
>>
>> We can resolve this problem by checking the async_error flag after
>> waiting for children to suspend, using the same logic for the noirq and
>> late suspend cases as we already do for __device_suspend().
>>
>> It's easy to observe this erroneous behavior by, for example, forcing a
>> device to sleep a bit in its suspend_noirq() (to ensure the parent is
>> waiting for the child to complete), then return an error, and watch the
>> parent suspend_noirq() still get called. (Or similarly, fake a wakeup
>> event at the right (or is it wrong?) time.)
>>
>> Fixes: de377b397272 ("PM / sleep: Asynchronous threads for suspend_late")
>> Fixes: 28b6fd6e3779 ("PM / sleep: Asynchronous threads for suspend_noirq")
>> Reported-by: Jeffy Chen <jeffy.chen@rock-chips.com>
>> Signed-off-by: Brian Norris <briannorris@chromium.org>
>> Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
>> ---
>> v2: s/early/late/ in commit message
>>
>>  drivers/base/power/main.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
>> index c58563581345..eaf6b53463a5 100644
>> --- a/drivers/base/power/main.c
>> +++ b/drivers/base/power/main.c
>> @@ -1040,6 +1040,9 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a
>>
>>       dpm_wait_for_children(dev, async);
>>
>> +     if (async_error)
>> +             goto Complete;
>> +
>
> This is a second chech for async_error in this routine and is the first one
> really needed after adding this?

There is really no point in waiting for children to be suspended if
error has already been signalled; that's what first check achieves.
The 2nd check ensures that we abort suspend if any of the children
failed to suspend.

I'd say both checks are needed (well, 1st is helpful, 2nd is essential).

>
>>       if (dev->pm_domain) {
>>               info = "noirq power domain ";
>>               callback = pm_noirq_op(&dev->pm_domain->ops, state);
>> @@ -1187,6 +1190,9 @@ static int __device_suspend_late(struct device *dev, pm_message_t state, bool as
>>
>>       dpm_wait_for_children(dev, async);
>>
>> +     if (async_error)
>> +             goto Complete;
>> +
>
> Same question.
>
>>       if (dev->pm_domain) {
>>               info = "late power domain ";
>>               callback = pm_late_early_op(&dev->pm_domain->ops, state);
>>
>
> Thanks,
> Rafael
>

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] PM / sleep: don't suspend parent when async child suspend_{noirq,late} fails
  2016-11-01  6:04       ` Dmitry Torokhov
@ 2016-11-02  3:51         ` Rafael J. Wysocki
  2016-11-02  5:07           ` Brian Norris
  0 siblings, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2016-11-02  3:51 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Brian Norris, Pavel Machek, Len Brown, Greg Kroah-Hartman, lkml,
	Doug Anderson, Brian Norris, Jeffy Chen, linux-pm,
	Chuansheng Liu

On Tuesday, November 01, 2016 12:04:28 AM Dmitry Torokhov wrote:
> On Mon, Oct 31, 2016 at 10:25 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > On Thursday, October 27, 2016 09:05:34 AM Brian Norris wrote:
> >> Consider two devices, A and B, where B is a child of A, and B utilizes
> >> asynchronous suspend (it does not matter whether A is sync or async). If
> >> B fails to suspend_noirq() or suspend_late(), or is interrupted by a
> >> wakeup (pm_wakeup_pending()), then it aborts and sets the async_error
> >> variable. However, device A does not (immediately) check the async_error
> >> variable; it may continue to run its own suspend_noirq()/suspend_late()
> >> callback. This is bad.
> >>
> >> We can resolve this problem by checking the async_error flag after
> >> waiting for children to suspend, using the same logic for the noirq and
> >> late suspend cases as we already do for __device_suspend().
> >>
> >> It's easy to observe this erroneous behavior by, for example, forcing a
> >> device to sleep a bit in its suspend_noirq() (to ensure the parent is
> >> waiting for the child to complete), then return an error, and watch the
> >> parent suspend_noirq() still get called. (Or similarly, fake a wakeup
> >> event at the right (or is it wrong?) time.)
> >>
> >> Fixes: de377b397272 ("PM / sleep: Asynchronous threads for suspend_late")
> >> Fixes: 28b6fd6e3779 ("PM / sleep: Asynchronous threads for suspend_noirq")
> >> Reported-by: Jeffy Chen <jeffy.chen@rock-chips.com>
> >> Signed-off-by: Brian Norris <briannorris@chromium.org>
> >> Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
> >> ---
> >> v2: s/early/late/ in commit message
> >>
> >>  drivers/base/power/main.c | 6 ++++++
> >>  1 file changed, 6 insertions(+)
> >>
> >> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> >> index c58563581345..eaf6b53463a5 100644
> >> --- a/drivers/base/power/main.c
> >> +++ b/drivers/base/power/main.c
> >> @@ -1040,6 +1040,9 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a
> >>
> >>       dpm_wait_for_children(dev, async);
> >>
> >> +     if (async_error)
> >> +             goto Complete;
> >> +
> >
> > This is a second chech for async_error in this routine and is the first one
> > really needed after adding this?
> 
> There is really no point in waiting for children to be suspended if
> error has already been signalled; that's what first check achieves.
> The 2nd check ensures that we abort suspend if any of the children
> failed to suspend.
> 
> I'd say both checks are needed (well, 1st is helpful, 2nd is essential).

OK, fair enough.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] PM / sleep: don't suspend parent when async child suspend_{noirq,late} fails
  2016-11-02  3:51         ` Rafael J. Wysocki
@ 2016-11-02  5:07           ` Brian Norris
  2016-11-10  0:08             ` Rafael J. Wysocki
  0 siblings, 1 reply; 25+ messages in thread
From: Brian Norris @ 2016-11-02  5:07 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Dmitry Torokhov, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	lkml, Doug Anderson, Brian Norris, Jeffy Chen, linux-pm,
	Chuansheng Liu, Kevin Hilman, Ulf Hansson

+ more genpd folks

On Wed, Nov 02, 2016 at 04:51:08AM +0100, Rafael J. Wysocki wrote:
> On Tuesday, November 01, 2016 12:04:28 AM Dmitry Torokhov wrote:
> > On Mon, Oct 31, 2016 at 10:25 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > On Thursday, October 27, 2016 09:05:34 AM Brian Norris wrote:
> > >> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > >> index c58563581345..eaf6b53463a5 100644
> > >> --- a/drivers/base/power/main.c
> > >> +++ b/drivers/base/power/main.c
> > >> @@ -1040,6 +1040,9 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a
> > >>
> > >>       dpm_wait_for_children(dev, async);
> > >>
> > >> +     if (async_error)
> > >> +             goto Complete;
> > >> +
> > >
> > > This is a second chech for async_error in this routine and is the first one
> > > really needed after adding this?
> > 
> > There is really no point in waiting for children to be suspended if
> > error has already been signalled; that's what first check achieves.
> > The 2nd check ensures that we abort suspend if any of the children
> > failed to suspend.
> > 
> > I'd say both checks are needed (well, 1st is helpful, 2nd is essential).
> 
> OK, fair enough.

Sort of agreed, although I'm still not sure how helpful the 1st one is;
kinda serves to complicate things, for little real benefit IMO (you
don't save much time by "not waiting" -- either the child quickly
notices the same error and complete()'s quickly, or else you're going to
wait for that child in the end anyway).

I think it's also important to ask why we do this optimization in the
{late,noirq} cases, but we don't do this in __device_suspend(). As
demonstrated by the $subject bug, I think we would yield fewer bugs by
sharing code structure (if not the code itself) among the similar
phases.

I'm happy for you to take my current patch, of course, but I think some
further effort on making this consistent might be warranted. Either put
all of these short-circuit checks after the wait_for_children(), or else
add the same short-circuit for the missing case (__device_suspend()).
i.e., this (untested) patch:

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index e44944f4be77..2932a5bd892f 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1027,6 +1027,8 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a
 	TRACE_DEVICE(dev);
 	TRACE_SUSPEND(0);
 
+	dpm_wait_for_children(dev, async);
+
 	if (async_error)
 		goto Complete;
 
@@ -1038,8 +1040,6 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a
 	if (dev->power.syscore || dev->power.direct_complete)
 		goto Complete;
 
-	dpm_wait_for_children(dev, async);
-
 	if (dev->pm_domain) {
 		info = "noirq power domain ";
 		callback = pm_noirq_op(&dev->pm_domain->ops, state);
@@ -1174,6 +1174,8 @@ static int __device_suspend_late(struct device *dev, pm_message_t state, bool as
 
 	__pm_runtime_disable(dev, false);
 
+	dpm_wait_for_children(dev, async);
+
 	if (async_error)
 		goto Complete;
 
@@ -1185,8 +1187,6 @@ static int __device_suspend_late(struct device *dev, pm_message_t state, bool as
 	if (dev->power.syscore || dev->power.direct_complete)
 		goto Complete;
 
-	dpm_wait_for_children(dev, async);
-
 	if (dev->pm_domain) {
 		info = "late power domain ";
 		callback = pm_late_early_op(&dev->pm_domain->ops, state);

---

I can test this and send it in proper form if that looks preferable.

P.S. To get slightly off-topic here (but speaking of noirq bugs): I
noticed the genpd code has comments like this scattered all over:

 * This function is only called in "noirq" and "syscore" stages of system power
 * transitions, so it need not acquire locks (all of the "noirq" callbacks are
 * executed sequentially, so it is guaranteed that it will never run twice in
 * parallel).

Isn't that no longer true, now that noirq suspend can be asynchronous?
Maybe we should grep for the phrase "need not acquire locks" throughout
the kernel, in order to find low-hanging fruit for race conditions :)

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RESEND PATCH 1/2] PM / sleep: print function name of callbacks
  2016-11-01  4:27 ` Rafael J. Wysocki
@ 2016-11-02 21:02   ` Brian Norris
  0 siblings, 0 replies; 25+ messages in thread
From: Brian Norris @ 2016-11-02 21:02 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pavel Machek, Len Brown, Greg Kroah-Hartman, linux-kernel,
	Doug Anderson, Brian Norris, Jeffy Chen, linux-pm,
	Chuansheng Liu, Dmitry Torokhov

On Tue, Nov 01, 2016 at 05:27:05AM +0100, Rafael J. Wysocki wrote:
> Any reason why you need to rely on the initcall_debug stuff instead of using
> the tracepoints we have there (for exactly the reason why you are pushing this
> patch)?

This was mentioned on the last submission. I'll paste Doug's reply from
there:

"""
On Tue, Sep 22, 2015 at 10:33 AM, Brown, Len <len.brown@intel.com>
wrote:
> have you used analyze_suspend?
> It used to parse this output, but that was abandoned
> when it cut over to using ftrace directly.
>
> https://01.org/suspendresume

Ah, good to know about.  In my case this output is enabled on shipping
devices and sometimes we get back bug reports with these prints in the
log, so I'm interested in making the prints more useful.  :)

-Doug
"""

Perhaps I should have elabotated on the RESEND...

Or is there a really good way to make use of the tracepoints for
production systems, including for crash reports? We currently get the
kernel log from pstore, but I see ftrace support has been there for a
little while too. Still, it's awfully useful to have kernel prints and
warnings interleave with initcall_debug, and it wouldn't be quite as
easy to have to piece this together between kernel log + ftrace log.

Brian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] PM / sleep: don't suspend parent when async child suspend_{noirq,late} fails
  2016-11-02  5:07           ` Brian Norris
@ 2016-11-10  0:08             ` Rafael J. Wysocki
  2016-11-10  0:18               ` Brian Norris
  0 siblings, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2016-11-10  0:08 UTC (permalink / raw)
  To: Brian Norris
  Cc: Rafael J. Wysocki, Dmitry Torokhov, Pavel Machek, Len Brown,
	Greg Kroah-Hartman, lkml, Doug Anderson, Brian Norris,
	Jeffy Chen, linux-pm, Chuansheng Liu, Kevin Hilman, Ulf Hansson

On Wed, Nov 2, 2016 at 6:07 AM, Brian Norris <briannorris@chromium.org> wrote:
> + more genpd folks
>
> On Wed, Nov 02, 2016 at 04:51:08AM +0100, Rafael J. Wysocki wrote:
>> On Tuesday, November 01, 2016 12:04:28 AM Dmitry Torokhov wrote:
>> > On Mon, Oct 31, 2016 at 10:25 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> > > On Thursday, October 27, 2016 09:05:34 AM Brian Norris wrote:
>> > >> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
>> > >> index c58563581345..eaf6b53463a5 100644
>> > >> --- a/drivers/base/power/main.c
>> > >> +++ b/drivers/base/power/main.c
>> > >> @@ -1040,6 +1040,9 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a
>> > >>
>> > >>       dpm_wait_for_children(dev, async);
>> > >>
>> > >> +     if (async_error)
>> > >> +             goto Complete;
>> > >> +
>> > >
>> > > This is a second chech for async_error in this routine and is the first one
>> > > really needed after adding this?
>> >
>> > There is really no point in waiting for children to be suspended if
>> > error has already been signalled; that's what first check achieves.
>> > The 2nd check ensures that we abort suspend if any of the children
>> > failed to suspend.
>> >
>> > I'd say both checks are needed (well, 1st is helpful, 2nd is essential).
>>
>> OK, fair enough.
>
> Sort of agreed, although I'm still not sure how helpful the 1st one is;
> kinda serves to complicate things, for little real benefit IMO (you
> don't save much time by "not waiting" -- either the child quickly
> notices the same error and complete()'s quickly, or else you're going to
> wait for that child in the end anyway).
>
> I think it's also important to ask why we do this optimization in the
> {late,noirq} cases, but we don't do this in __device_suspend(). As
> demonstrated by the $subject bug, I think we would yield fewer bugs by
> sharing code structure (if not the code itself) among the similar
> phases.
>
> I'm happy for you to take my current patch, of course, but I think some
> further effort on making this consistent might be warranted. Either put
> all of these short-circuit checks after the wait_for_children(), or else
> add the same short-circuit for the missing case (__device_suspend()).
> i.e., this (untested) patch:
>
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index e44944f4be77..2932a5bd892f 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -1027,6 +1027,8 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a
>         TRACE_DEVICE(dev);
>         TRACE_SUSPEND(0);
>
> +       dpm_wait_for_children(dev, async);
> +
>         if (async_error)
>                 goto Complete;
>
> @@ -1038,8 +1040,6 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a
>         if (dev->power.syscore || dev->power.direct_complete)
>                 goto Complete;
>
> -       dpm_wait_for_children(dev, async);
> -
>         if (dev->pm_domain) {
>                 info = "noirq power domain ";
>                 callback = pm_noirq_op(&dev->pm_domain->ops, state);
> @@ -1174,6 +1174,8 @@ static int __device_suspend_late(struct device *dev, pm_message_t state, bool as
>
>         __pm_runtime_disable(dev, false);
>
> +       dpm_wait_for_children(dev, async);
> +
>         if (async_error)
>                 goto Complete;
>
> @@ -1185,8 +1187,6 @@ static int __device_suspend_late(struct device *dev, pm_message_t state, bool as
>         if (dev->power.syscore || dev->power.direct_complete)
>                 goto Complete;
>
> -       dpm_wait_for_children(dev, async);
> -
>         if (dev->pm_domain) {
>                 info = "late power domain ";
>                 callback = pm_late_early_op(&dev->pm_domain->ops, state);
>
> ---
>
> I can test this and send it in proper form if that looks preferable.

It does to me as per the discussion at the LPC.

Are you still going to submit it?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] PM / sleep: don't suspend parent when async child suspend_{noirq,late} fails
  2016-11-10  0:08             ` Rafael J. Wysocki
@ 2016-11-10  0:18               ` Brian Norris
  0 siblings, 0 replies; 25+ messages in thread
From: Brian Norris @ 2016-11-10  0:18 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Dmitry Torokhov, Pavel Machek, Len Brown,
	Greg Kroah-Hartman, lkml, Doug Anderson, Brian Norris,
	Jeffy Chen, linux-pm, Chuansheng Liu, Kevin Hilman, Ulf Hansson

On Thu, Nov 10, 2016 at 01:08:56AM +0100, Rafael J. Wysocki wrote:
> On Wed, Nov 2, 2016 at 6:07 AM, Brian Norris <briannorris@chromium.org> wrote:
> > I can test this and send it in proper form if that looks preferable.
> 
> It does to me as per the discussion at the LPC.
> 
> Are you still going to submit it?

Yes, it just moved down my mental list of things. Will do today.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3] PM / sleep: don't suspend parent when async child suspend_{noirq,late} fails
  2016-10-27 16:05     ` Brian Norris
@ 2016-11-10  1:21       ` Brian Norris
  -1 siblings, 0 replies; 25+ messages in thread
From: Brian Norris @ 2016-11-10  1:21 UTC (permalink / raw)
  To: Rafael J . Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman
  Cc: linux-kernel, Doug Anderson, Brian Norris, Jeffy Chen, linux-pm,
	Chuansheng Liu, Dmitry Torokhov, Brian Norris

Consider two devices, A and B, where B is a child of A, and B utilizes
asynchronous suspend (it does not matter whether A is sync or async). If
B fails to suspend_noirq() or suspend_late(), or is interrupted by a
wakeup (pm_wakeup_pending()), then it aborts and sets the async_error
variable. However, device A does not (immediately) check the async_error
variable; it may continue to run its own suspend_noirq()/suspend_late()
callback. This is bad.

We can resolve this problem by doing our error and wakeup checking
(particularly, for the async_error flag) after waiting for children to
suspend, instead of before. This also helps align the logic for the noirq and
late suspend cases with the logic in __device_suspend().

It's easy to observe this erroneous behavior by, for example, forcing a
device to sleep a bit in its suspend_noirq() (to ensure the parent is
waiting for the child to complete), then return an error, and watch the
parent suspend_noirq() still get called. (Or similarly, fake a wakeup
event at the right (or is it wrong?) time.)

Fixes: de377b397272 ("PM / sleep: Asynchronous threads for suspend_late")
Fixes: 28b6fd6e3779 ("PM / sleep: Asynchronous threads for suspend_noirq")
Reported-by: Jeffy Chen <jeffy.chen@rock-chips.com>
Signed-off-by: Brian Norris <briannorris@chromium.org>
---
v2: s/early/late/ in commit message

v3:
 * drop patch 1, as the callback-printing is unrelated, semi-controversial, and
   might break existing (but poor -- c'mon, since when do tools get to rely on
   kernel messages?) tools
 * do all error checking after dpm_wait_for_children() -- this helps make
   things consistent and reduces duplication.
 * drop Dmitry's Reviewed-by, since the patch changed enough

 drivers/base/power/main.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index c58563581345..57a8ca4bc8ab 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1027,6 +1027,8 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a
 	TRACE_DEVICE(dev);
 	TRACE_SUSPEND(0);
 
+	dpm_wait_for_children(dev, async);
+
 	if (async_error)
 		goto Complete;
 
@@ -1038,8 +1040,6 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a
 	if (dev->power.syscore || dev->power.direct_complete)
 		goto Complete;
 
-	dpm_wait_for_children(dev, async);
-
 	if (dev->pm_domain) {
 		info = "noirq power domain ";
 		callback = pm_noirq_op(&dev->pm_domain->ops, state);
@@ -1174,6 +1174,8 @@ static int __device_suspend_late(struct device *dev, pm_message_t state, bool as
 
 	__pm_runtime_disable(dev, false);
 
+	dpm_wait_for_children(dev, async);
+
 	if (async_error)
 		goto Complete;
 
@@ -1185,8 +1187,6 @@ static int __device_suspend_late(struct device *dev, pm_message_t state, bool as
 	if (dev->power.syscore || dev->power.direct_complete)
 		goto Complete;
 
-	dpm_wait_for_children(dev, async);
-
 	if (dev->pm_domain) {
 		info = "late power domain ";
 		callback = pm_late_early_op(&dev->pm_domain->ops, state);
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3] PM / sleep: don't suspend parent when async child suspend_{noirq,late} fails
@ 2016-11-10  1:21       ` Brian Norris
  0 siblings, 0 replies; 25+ messages in thread
From: Brian Norris @ 2016-11-10  1:21 UTC (permalink / raw)
  To: Rafael J . Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman
  Cc: linux-kernel, Doug Anderson, Brian Norris, Jeffy Chen, linux-pm,
	Chuansheng Liu, Dmitry Torokhov, Brian Norris

Consider two devices, A and B, where B is a child of A, and B utilizes
asynchronous suspend (it does not matter whether A is sync or async). If
B fails to suspend_noirq() or suspend_late(), or is interrupted by a
wakeup (pm_wakeup_pending()), then it aborts and sets the async_error
variable. However, device A does not (immediately) check the async_error
variable; it may continue to run its own suspend_noirq()/suspend_late()
callback. This is bad.

We can resolve this problem by doing our error and wakeup checking
(particularly, for the async_error flag) after waiting for children to
suspend, instead of before. This also helps align the logic for the noirq and
late suspend cases with the logic in __device_suspend().

It's easy to observe this erroneous behavior by, for example, forcing a
device to sleep a bit in its suspend_noirq() (to ensure the parent is
waiting for the child to complete), then return an error, and watch the
parent suspend_noirq() still get called. (Or similarly, fake a wakeup
event at the right (or is it wrong?) time.)

Fixes: de377b397272 ("PM / sleep: Asynchronous threads for suspend_late")
Fixes: 28b6fd6e3779 ("PM / sleep: Asynchronous threads for suspend_noirq")
Reported-by: Jeffy Chen <jeffy.chen@rock-chips.com>
Signed-off-by: Brian Norris <briannorris@chromium.org>
---
v2: s/early/late/ in commit message

v3:
 * drop patch 1, as the callback-printing is unrelated, semi-controversial, and
   might break existing (but poor -- c'mon, since when do tools get to rely on
   kernel messages?) tools
 * do all error checking after dpm_wait_for_children() -- this helps make
   things consistent and reduces duplication.
 * drop Dmitry's Reviewed-by, since the patch changed enough

 drivers/base/power/main.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index c58563581345..57a8ca4bc8ab 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1027,6 +1027,8 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a
 	TRACE_DEVICE(dev);
 	TRACE_SUSPEND(0);
 
+	dpm_wait_for_children(dev, async);
+
 	if (async_error)
 		goto Complete;
 
@@ -1038,8 +1040,6 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a
 	if (dev->power.syscore || dev->power.direct_complete)
 		goto Complete;
 
-	dpm_wait_for_children(dev, async);
-
 	if (dev->pm_domain) {
 		info = "noirq power domain ";
 		callback = pm_noirq_op(&dev->pm_domain->ops, state);
@@ -1174,6 +1174,8 @@ static int __device_suspend_late(struct device *dev, pm_message_t state, bool as
 
 	__pm_runtime_disable(dev, false);
 
+	dpm_wait_for_children(dev, async);
+
 	if (async_error)
 		goto Complete;
 
@@ -1185,8 +1187,6 @@ static int __device_suspend_late(struct device *dev, pm_message_t state, bool as
 	if (dev->power.syscore || dev->power.direct_complete)
 		goto Complete;
 
-	dpm_wait_for_children(dev, async);
-
 	if (dev->pm_domain) {
 		info = "late power domain ";
 		callback = pm_late_early_op(&dev->pm_domain->ops, state);
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] PM / sleep: don't suspend parent when async child suspend_{noirq,late} fails
  2016-11-10  1:21       ` Brian Norris
  (?)
@ 2016-11-10  1:53       ` Rafael J. Wysocki
  2016-11-10  2:00         ` Brian Norris
  -1 siblings, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2016-11-10  1:53 UTC (permalink / raw)
  To: Brian Norris
  Cc: Rafael J . Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Linux Kernel Mailing List, Doug Anderson, Brian Norris,
	Jeffy Chen, Linux PM, Chuansheng Liu, Dmitry Torokhov

On Thu, Nov 10, 2016 at 2:21 AM, Brian Norris <briannorris@chromium.org> wrote:
> Consider two devices, A and B, where B is a child of A, and B utilizes
> asynchronous suspend (it does not matter whether A is sync or async). If
> B fails to suspend_noirq() or suspend_late(), or is interrupted by a
> wakeup (pm_wakeup_pending()), then it aborts and sets the async_error
> variable. However, device A does not (immediately) check the async_error
> variable; it may continue to run its own suspend_noirq()/suspend_late()
> callback. This is bad.
>
> We can resolve this problem by doing our error and wakeup checking
> (particularly, for the async_error flag) after waiting for children to
> suspend, instead of before. This also helps align the logic for the noirq and
> late suspend cases with the logic in __device_suspend().
>
> It's easy to observe this erroneous behavior by, for example, forcing a
> device to sleep a bit in its suspend_noirq() (to ensure the parent is
> waiting for the child to complete), then return an error, and watch the
> parent suspend_noirq() still get called. (Or similarly, fake a wakeup
> event at the right (or is it wrong?) time.)
>
> Fixes: de377b397272 ("PM / sleep: Asynchronous threads for suspend_late")
> Fixes: 28b6fd6e3779 ("PM / sleep: Asynchronous threads for suspend_noirq")
> Reported-by: Jeffy Chen <jeffy.chen@rock-chips.com>
> Signed-off-by: Brian Norris <briannorris@chromium.org>
> ---
> v2: s/early/late/ in commit message
>
> v3:
>  * drop patch 1, as the callback-printing is unrelated, semi-controversial, and
>    might break existing (but poor -- c'mon, since when do tools get to rely on
>    kernel messages?) tools
>  * do all error checking after dpm_wait_for_children() -- this helps make
>    things consistent and reduces duplication.
>  * drop Dmitry's Reviewed-by, since the patch changed enough
>
>  drivers/base/power/main.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index c58563581345..57a8ca4bc8ab 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -1027,6 +1027,8 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a
>         TRACE_DEVICE(dev);
>         TRACE_SUSPEND(0);
>
> +       dpm_wait_for_children(dev, async);
> +

On a second thought. I'd move the

if (dev->power.syscore || dev->power.direct_complete)

along with this (and put it in front), because those flags won't
change while children are being waited on anyway.

>         if (async_error)
>                 goto Complete;
>
> @@ -1038,8 +1040,6 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a
>         if (dev->power.syscore || dev->power.direct_complete)
>                 goto Complete;
>
> -       dpm_wait_for_children(dev, async);
> -
>         if (dev->pm_domain) {
>                 info = "noirq power domain ";
>                 callback = pm_noirq_op(&dev->pm_domain->ops, state);
> @@ -1174,6 +1174,8 @@ static int __device_suspend_late(struct device *dev, pm_message_t state, bool as
>
>         __pm_runtime_disable(dev, false);
>
> +       dpm_wait_for_children(dev, async);
> +

And analogously here.

>         if (async_error)
>                 goto Complete;
>
> @@ -1185,8 +1187,6 @@ static int __device_suspend_late(struct device *dev, pm_message_t state, bool as
>         if (dev->power.syscore || dev->power.direct_complete)
>                 goto Complete;
>
> -       dpm_wait_for_children(dev, async);
> -
>         if (dev->pm_domain) {
>                 info = "late power domain ";
>                 callback = pm_late_early_op(&dev->pm_domain->ops, state);

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] PM / sleep: don't suspend parent when async child suspend_{noirq,late} fails
  2016-11-10  1:53       ` Rafael J. Wysocki
@ 2016-11-10  2:00         ` Brian Norris
  2016-11-11  1:42           ` Rafael J. Wysocki
  0 siblings, 1 reply; 25+ messages in thread
From: Brian Norris @ 2016-11-10  2:00 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J . Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Linux Kernel Mailing List, Doug Anderson, Brian Norris,
	Jeffy Chen, Linux PM, Chuansheng Liu, Dmitry Torokhov

On Thu, Nov 10, 2016 at 02:53:20AM +0100, Rafael J. Wysocki wrote:
> On Thu, Nov 10, 2016 at 2:21 AM, Brian Norris <briannorris@chromium.org> wrote:
> > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > index c58563581345..57a8ca4bc8ab 100644
> > --- a/drivers/base/power/main.c
> > +++ b/drivers/base/power/main.c
> > @@ -1027,6 +1027,8 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a
> >         TRACE_DEVICE(dev);
> >         TRACE_SUSPEND(0);
> >
> > +       dpm_wait_for_children(dev, async);
> > +
> 
> On a second thought. I'd move the
> 
> if (dev->power.syscore || dev->power.direct_complete)
> 
> along with this (and put it in front), because those flags won't
> change while children are being waited on anyway.

I can do that, but is it really necessary? It's also not the order we do
it for __device_suspend(). I don't like arbitrarily making optimizations
in this code differently to the non-{noirq,late} versions.

Also, would it cause any problem to have a parent return success before
its children have suspended? I haven't reasoned through all the cases
there, but I wouldn't do that without reason.

Brian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] PM / sleep: don't suspend parent when async child suspend_{noirq,late} fails
  2016-11-10  2:00         ` Brian Norris
@ 2016-11-11  1:42           ` Rafael J. Wysocki
  0 siblings, 0 replies; 25+ messages in thread
From: Rafael J. Wysocki @ 2016-11-11  1:42 UTC (permalink / raw)
  To: Brian Norris
  Cc: Rafael J. Wysocki, Rafael J . Wysocki, Pavel Machek, Len Brown,
	Greg Kroah-Hartman, Linux Kernel Mailing List, Doug Anderson,
	Brian Norris, Jeffy Chen, Linux PM, Chuansheng Liu,
	Dmitry Torokhov

On Thu, Nov 10, 2016 at 3:00 AM, Brian Norris <briannorris@chromium.org> wrote:
> On Thu, Nov 10, 2016 at 02:53:20AM +0100, Rafael J. Wysocki wrote:
>> On Thu, Nov 10, 2016 at 2:21 AM, Brian Norris <briannorris@chromium.org> wrote:
>> > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
>> > index c58563581345..57a8ca4bc8ab 100644
>> > --- a/drivers/base/power/main.c
>> > +++ b/drivers/base/power/main.c
>> > @@ -1027,6 +1027,8 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a
>> >         TRACE_DEVICE(dev);
>> >         TRACE_SUSPEND(0);
>> >
>> > +       dpm_wait_for_children(dev, async);
>> > +
>>
>> On a second thought. I'd move the
>>
>> if (dev->power.syscore || dev->power.direct_complete)
>>
>> along with this (and put it in front), because those flags won't
>> change while children are being waited on anyway.
>
> I can do that, but is it really necessary? It's also not the order we do
> it for __device_suspend(). I don't like arbitrarily making optimizations
> in this code differently to the non-{noirq,late} versions.

Good point.

Applied, thanks!

Rafael

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-11-11  1:42 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-20  0:26 [RESEND PATCH 1/2] PM / sleep: print function name of callbacks Brian Norris
2016-10-20  0:26 ` Brian Norris
2016-10-20  0:26 ` [PATCH 2/2] PM / sleep: don't suspend parent when async child suspend_{noirq,early} fails Brian Norris
2016-10-20  0:26   ` Brian Norris
2016-10-20  0:46   ` Brian Norris
2016-10-27 15:34     ` Greg Kroah-Hartman
2016-10-27 16:03       ` Brian Norris
2016-10-20  0:56   ` Dmitry Torokhov
2016-10-27 16:05   ` [PATCH v2 2/2] PM / sleep: don't suspend parent when async child suspend_{noirq,late} fails Brian Norris
2016-10-27 16:05     ` Brian Norris
2016-11-01  4:25     ` Rafael J. Wysocki
2016-11-01  5:22       ` Brian Norris
2016-11-01  6:04       ` Dmitry Torokhov
2016-11-02  3:51         ` Rafael J. Wysocki
2016-11-02  5:07           ` Brian Norris
2016-11-10  0:08             ` Rafael J. Wysocki
2016-11-10  0:18               ` Brian Norris
2016-11-10  1:21     ` [PATCH v3] " Brian Norris
2016-11-10  1:21       ` Brian Norris
2016-11-10  1:53       ` Rafael J. Wysocki
2016-11-10  2:00         ` Brian Norris
2016-11-11  1:42           ` Rafael J. Wysocki
2016-10-20  0:52 ` [RESEND PATCH 1/2] PM / sleep: print function name of callbacks Dmitry Torokhov
2016-11-01  4:27 ` Rafael J. Wysocki
2016-11-02 21:02   ` Brian Norris

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.