All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
@ 2010-09-02  2:54 Colin Cross
  2010-09-02 13:50 ` Alan Stern
  2010-09-02 13:50 ` Alan Stern
  0 siblings, 2 replies; 55+ messages in thread
From: Colin Cross @ 2010-09-02  2:54 UTC (permalink / raw)
  To: linux-kernel, linux-pm
  Cc: Colin Cross, Pavel Machek, Rafael J. Wysocki, Len Brown,
	Greg Kroah-Hartman, Alan Stern, Randy Dunlap, Andrew Morton

Only wait on a parent device during resume if the parent device is
suspended.

Consider three drivers, A, B, and C.  The parent of A is C, and C
has async_suspend set.  On boot, C->power.completion is initialized
to 0.

During the first suspend:
suspend_devices_and_enter(...)
 dpm_resume(...)
  device_suspend(A)
  device_suspend(B) returns error, aborts suspend
 dpm_resume_end(...)
   dpm_resume(...)
    device_resume(A)
     dpm_wait(A->parent == C)
      wait_for_completion(C->power.completion)

The wait_for_completion will never complete, because
complete_all(C->power.completion) will only be called from
device_suspend(C) or device_resume(C), neither of which is called
if suspend is aborted before C.

After a successful suspend->resume cycle, where B doesn't abort
suspend, C->power.completion is left in the completed state by the
call to device_resume(C), and the same call path will work if B
aborts suspend.

Signed-off-by: Colin Cross <ccross@android.com>
---
 drivers/base/power/main.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index cb784a0..e159910 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -526,7 +526,8 @@ static int device_resume(struct device *dev, pm_message_t state, bool async)
 	TRACE_DEVICE(dev);
 	TRACE_RESUME(0);
 
-	dpm_wait(dev->parent, async);
+	if (dev->parent && dev->parent->power.status >= DPM_OFF)
+		dpm_wait(dev->parent, async);
 	device_lock(dev);
 
 	dev->power.status = DPM_RESUMING;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02  2:54 [PATCH] PM: Prevent waiting forever on asynchronous resume after abort Colin Cross
  2010-09-02 13:50 ` Alan Stern
@ 2010-09-02 13:50 ` Alan Stern
  2010-09-02 19:46   ` Rafael J. Wysocki
  2010-09-02 19:46   ` Rafael J. Wysocki
  1 sibling, 2 replies; 55+ messages in thread
From: Alan Stern @ 2010-09-02 13:50 UTC (permalink / raw)
  To: Colin Cross
  Cc: linux-kernel, linux-pm, Pavel Machek, Rafael J. Wysocki,
	Len Brown, Greg Kroah-Hartman, Randy Dunlap, Andrew Morton

On Wed, 1 Sep 2010, Colin Cross wrote:

> Only wait on a parent device during resume if the parent device is
> suspended.
> 
> Consider three drivers, A, B, and C.  The parent of A is C, and C
> has async_suspend set.  On boot, C->power.completion is initialized
> to 0.
> 
> During the first suspend:
> suspend_devices_and_enter(...)
>  dpm_resume(...)
>   device_suspend(A)
>   device_suspend(B) returns error, aborts suspend
>  dpm_resume_end(...)
>    dpm_resume(...)
>     device_resume(A)
>      dpm_wait(A->parent == C)
>       wait_for_completion(C->power.completion)
> 
> The wait_for_completion will never complete, because
> complete_all(C->power.completion) will only be called from
> device_suspend(C) or device_resume(C), neither of which is called
> if suspend is aborted before C.

This would work okay if C->power.completion had been initialized to the 
completed state during boot, right?

> After a successful suspend->resume cycle, where B doesn't abort
> suspend, C->power.completion is left in the completed state by the
> call to device_resume(C), and the same call path will work if B
> aborts suspend.
> 
> Signed-off-by: Colin Cross <ccross@android.com>
> ---
>  drivers/base/power/main.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index cb784a0..e159910 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -526,7 +526,8 @@ static int device_resume(struct device *dev, pm_message_t state, bool async)
>  	TRACE_DEVICE(dev);
>  	TRACE_RESUME(0);
>  
> -	dpm_wait(dev->parent, async);
> +	if (dev->parent && dev->parent->power.status >= DPM_OFF)
> +		dpm_wait(dev->parent, async);
>  	device_lock(dev);
>  
>  	dev->power.status = DPM_RESUMING;

I think it would be better to change device_pm_init() and add a 
complete_all().

Alan Stern


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02  2:54 [PATCH] PM: Prevent waiting forever on asynchronous resume after abort Colin Cross
@ 2010-09-02 13:50 ` Alan Stern
  2010-09-02 13:50 ` Alan Stern
  1 sibling, 0 replies; 55+ messages in thread
From: Alan Stern @ 2010-09-02 13:50 UTC (permalink / raw)
  To: Colin Cross
  Cc: Randy Dunlap, Len Brown, Greg Kroah-Hartman, linux-kernel,
	linux-pm, Andrew Morton

On Wed, 1 Sep 2010, Colin Cross wrote:

> Only wait on a parent device during resume if the parent device is
> suspended.
> 
> Consider three drivers, A, B, and C.  The parent of A is C, and C
> has async_suspend set.  On boot, C->power.completion is initialized
> to 0.
> 
> During the first suspend:
> suspend_devices_and_enter(...)
>  dpm_resume(...)
>   device_suspend(A)
>   device_suspend(B) returns error, aborts suspend
>  dpm_resume_end(...)
>    dpm_resume(...)
>     device_resume(A)
>      dpm_wait(A->parent == C)
>       wait_for_completion(C->power.completion)
> 
> The wait_for_completion will never complete, because
> complete_all(C->power.completion) will only be called from
> device_suspend(C) or device_resume(C), neither of which is called
> if suspend is aborted before C.

This would work okay if C->power.completion had been initialized to the 
completed state during boot, right?

> After a successful suspend->resume cycle, where B doesn't abort
> suspend, C->power.completion is left in the completed state by the
> call to device_resume(C), and the same call path will work if B
> aborts suspend.
> 
> Signed-off-by: Colin Cross <ccross@android.com>
> ---
>  drivers/base/power/main.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index cb784a0..e159910 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -526,7 +526,8 @@ static int device_resume(struct device *dev, pm_message_t state, bool async)
>  	TRACE_DEVICE(dev);
>  	TRACE_RESUME(0);
>  
> -	dpm_wait(dev->parent, async);
> +	if (dev->parent && dev->parent->power.status >= DPM_OFF)
> +		dpm_wait(dev->parent, async);
>  	device_lock(dev);
>  
>  	dev->power.status = DPM_RESUMING;

I think it would be better to change device_pm_init() and add a 
complete_all().

Alan Stern

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 13:50 ` Alan Stern
  2010-09-02 19:46   ` Rafael J. Wysocki
@ 2010-09-02 19:46   ` Rafael J. Wysocki
  2010-09-02 20:24     ` Colin Cross
                       ` (3 more replies)
  1 sibling, 4 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2010-09-02 19:46 UTC (permalink / raw)
  To: Alan Stern
  Cc: Colin Cross, linux-kernel, linux-pm, Pavel Machek, Len Brown,
	Greg Kroah-Hartman, Randy Dunlap, Andrew Morton

On Thursday, September 02, 2010, Alan Stern wrote:
> On Wed, 1 Sep 2010, Colin Cross wrote:
> 
> > Only wait on a parent device during resume if the parent device is
> > suspended.
> > 
> > Consider three drivers, A, B, and C.  The parent of A is C, and C
> > has async_suspend set.  On boot, C->power.completion is initialized
> > to 0.
> > 
> > During the first suspend:
> > suspend_devices_and_enter(...)
> >  dpm_resume(...)
> >   device_suspend(A)
> >   device_suspend(B) returns error, aborts suspend
> >  dpm_resume_end(...)
> >    dpm_resume(...)
> >     device_resume(A)
> >      dpm_wait(A->parent == C)
> >       wait_for_completion(C->power.completion)
> > 
> > The wait_for_completion will never complete, because
> > complete_all(C->power.completion) will only be called from
> > device_suspend(C) or device_resume(C), neither of which is called
> > if suspend is aborted before C.
> 
> This would work okay if C->power.completion had been initialized to the 
> completed state during boot, right?
> 
> > After a successful suspend->resume cycle, where B doesn't abort
> > suspend, C->power.completion is left in the completed state by the
> > call to device_resume(C), and the same call path will work if B
> > aborts suspend.
> > 
> > Signed-off-by: Colin Cross <ccross@android.com>
> > ---
> >  drivers/base/power/main.c |    3 ++-
> >  1 files changed, 2 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > index cb784a0..e159910 100644
> > --- a/drivers/base/power/main.c
> > +++ b/drivers/base/power/main.c
> > @@ -526,7 +526,8 @@ static int device_resume(struct device *dev, pm_message_t state, bool async)
> >  	TRACE_DEVICE(dev);
> >  	TRACE_RESUME(0);
> >  
> > -	dpm_wait(dev->parent, async);
> > +	if (dev->parent && dev->parent->power.status >= DPM_OFF)
> > +		dpm_wait(dev->parent, async);
> >  	device_lock(dev);
> >  
> >  	dev->power.status = DPM_RESUMING;
> 
> I think it would be better to change device_pm_init() and add a 
> complete_all().

I agree.

Who's writing the patch?

Rafael

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 13:50 ` Alan Stern
@ 2010-09-02 19:46   ` Rafael J. Wysocki
  2010-09-02 19:46   ` Rafael J. Wysocki
  1 sibling, 0 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2010-09-02 19:46 UTC (permalink / raw)
  To: Alan Stern
  Cc: Randy Dunlap, Len Brown, Greg Kroah-Hartman, linux-kernel,
	Colin Cross, linux-pm, Andrew Morton

On Thursday, September 02, 2010, Alan Stern wrote:
> On Wed, 1 Sep 2010, Colin Cross wrote:
> 
> > Only wait on a parent device during resume if the parent device is
> > suspended.
> > 
> > Consider three drivers, A, B, and C.  The parent of A is C, and C
> > has async_suspend set.  On boot, C->power.completion is initialized
> > to 0.
> > 
> > During the first suspend:
> > suspend_devices_and_enter(...)
> >  dpm_resume(...)
> >   device_suspend(A)
> >   device_suspend(B) returns error, aborts suspend
> >  dpm_resume_end(...)
> >    dpm_resume(...)
> >     device_resume(A)
> >      dpm_wait(A->parent == C)
> >       wait_for_completion(C->power.completion)
> > 
> > The wait_for_completion will never complete, because
> > complete_all(C->power.completion) will only be called from
> > device_suspend(C) or device_resume(C), neither of which is called
> > if suspend is aborted before C.
> 
> This would work okay if C->power.completion had been initialized to the 
> completed state during boot, right?
> 
> > After a successful suspend->resume cycle, where B doesn't abort
> > suspend, C->power.completion is left in the completed state by the
> > call to device_resume(C), and the same call path will work if B
> > aborts suspend.
> > 
> > Signed-off-by: Colin Cross <ccross@android.com>
> > ---
> >  drivers/base/power/main.c |    3 ++-
> >  1 files changed, 2 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > index cb784a0..e159910 100644
> > --- a/drivers/base/power/main.c
> > +++ b/drivers/base/power/main.c
> > @@ -526,7 +526,8 @@ static int device_resume(struct device *dev, pm_message_t state, bool async)
> >  	TRACE_DEVICE(dev);
> >  	TRACE_RESUME(0);
> >  
> > -	dpm_wait(dev->parent, async);
> > +	if (dev->parent && dev->parent->power.status >= DPM_OFF)
> > +		dpm_wait(dev->parent, async);
> >  	device_lock(dev);
> >  
> >  	dev->power.status = DPM_RESUMING;
> 
> I think it would be better to change device_pm_init() and add a 
> complete_all().

I agree.

Who's writing the patch?

Rafael

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 19:46   ` Rafael J. Wysocki
@ 2010-09-02 20:24     ` Colin Cross
  2010-09-02 20:30       ` Rafael J. Wysocki
                         ` (4 more replies)
  2010-09-02 20:24     ` Colin Cross
                       ` (2 subsequent siblings)
  3 siblings, 5 replies; 55+ messages in thread
From: Colin Cross @ 2010-09-02 20:24 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, linux-kernel, linux-pm, Pavel Machek, Len Brown,
	Greg Kroah-Hartman, Randy Dunlap, Andrew Morton

On Thu, Sep 2, 2010 at 12:46 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Thursday, September 02, 2010, Alan Stern wrote:
>> On Wed, 1 Sep 2010, Colin Cross wrote:
>>
>> > Only wait on a parent device during resume if the parent device is
>> > suspended.
>> >
>> > Consider three drivers, A, B, and C.  The parent of A is C, and C
>> > has async_suspend set.  On boot, C->power.completion is initialized
>> > to 0.
>> >
>> > During the first suspend:
>> > suspend_devices_and_enter(...)
>> >  dpm_resume(...)
>> >   device_suspend(A)
>> >   device_suspend(B) returns error, aborts suspend
>> >  dpm_resume_end(...)
>> >    dpm_resume(...)
>> >     device_resume(A)
>> >      dpm_wait(A->parent == C)
>> >       wait_for_completion(C->power.completion)
>> >
>> > The wait_for_completion will never complete, because
>> > complete_all(C->power.completion) will only be called from
>> > device_suspend(C) or device_resume(C), neither of which is called
>> > if suspend is aborted before C.
>>
>> This would work okay if C->power.completion had been initialized to the
>> completed state during boot, right?
>>
>> > After a successful suspend->resume cycle, where B doesn't abort
>> > suspend, C->power.completion is left in the completed state by the
>> > call to device_resume(C), and the same call path will work if B
>> > aborts suspend.
>> >
>> > Signed-off-by: Colin Cross <ccross@android.com>
>> > ---
>> >  drivers/base/power/main.c |    3 ++-
>> >  1 files changed, 2 insertions(+), 1 deletions(-)
>> >
>> > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
>> > index cb784a0..e159910 100644
>> > --- a/drivers/base/power/main.c
>> > +++ b/drivers/base/power/main.c
>> > @@ -526,7 +526,8 @@ static int device_resume(struct device *dev, pm_message_t state, bool async)
>> >     TRACE_DEVICE(dev);
>> >     TRACE_RESUME(0);
>> >
>> > -   dpm_wait(dev->parent, async);
>> > +   if (dev->parent && dev->parent->power.status >= DPM_OFF)
>> > +           dpm_wait(dev->parent, async);
>> >     device_lock(dev);
>> >
>> >     dev->power.status = DPM_RESUMING;
>>
>> I think it would be better to change device_pm_init() and add a
>> complete_all().
>
> I agree.
That would work, and was my first solution, but it increases the
reliance on the completion variable being left completed between state
transitions, which is undocumented and unnecessary.  It seems more
straightforward to me to only wait on the parent if the parent is
suspended.

> Who's writing the patch?
I'll write it if you still don't like this one.


> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 19:46   ` Rafael J. Wysocki
  2010-09-02 20:24     ` Colin Cross
@ 2010-09-02 20:24     ` Colin Cross
  2010-09-02 20:27     ` Alan Stern
  2010-09-02 20:27     ` Alan Stern
  3 siblings, 0 replies; 55+ messages in thread
From: Colin Cross @ 2010-09-02 20:24 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Randy Dunlap, Len Brown, Greg Kroah-Hartman, linux-kernel,
	linux-pm, Andrew Morton

On Thu, Sep 2, 2010 at 12:46 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Thursday, September 02, 2010, Alan Stern wrote:
>> On Wed, 1 Sep 2010, Colin Cross wrote:
>>
>> > Only wait on a parent device during resume if the parent device is
>> > suspended.
>> >
>> > Consider three drivers, A, B, and C.  The parent of A is C, and C
>> > has async_suspend set.  On boot, C->power.completion is initialized
>> > to 0.
>> >
>> > During the first suspend:
>> > suspend_devices_and_enter(...)
>> >  dpm_resume(...)
>> >   device_suspend(A)
>> >   device_suspend(B) returns error, aborts suspend
>> >  dpm_resume_end(...)
>> >    dpm_resume(...)
>> >     device_resume(A)
>> >      dpm_wait(A->parent == C)
>> >       wait_for_completion(C->power.completion)
>> >
>> > The wait_for_completion will never complete, because
>> > complete_all(C->power.completion) will only be called from
>> > device_suspend(C) or device_resume(C), neither of which is called
>> > if suspend is aborted before C.
>>
>> This would work okay if C->power.completion had been initialized to the
>> completed state during boot, right?
>>
>> > After a successful suspend->resume cycle, where B doesn't abort
>> > suspend, C->power.completion is left in the completed state by the
>> > call to device_resume(C), and the same call path will work if B
>> > aborts suspend.
>> >
>> > Signed-off-by: Colin Cross <ccross@android.com>
>> > ---
>> >  drivers/base/power/main.c |    3 ++-
>> >  1 files changed, 2 insertions(+), 1 deletions(-)
>> >
>> > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
>> > index cb784a0..e159910 100644
>> > --- a/drivers/base/power/main.c
>> > +++ b/drivers/base/power/main.c
>> > @@ -526,7 +526,8 @@ static int device_resume(struct device *dev, pm_message_t state, bool async)
>> >     TRACE_DEVICE(dev);
>> >     TRACE_RESUME(0);
>> >
>> > -   dpm_wait(dev->parent, async);
>> > +   if (dev->parent && dev->parent->power.status >= DPM_OFF)
>> > +           dpm_wait(dev->parent, async);
>> >     device_lock(dev);
>> >
>> >     dev->power.status = DPM_RESUMING;
>>
>> I think it would be better to change device_pm_init() and add a
>> complete_all().
>
> I agree.
That would work, and was my first solution, but it increases the
reliance on the completion variable being left completed between state
transitions, which is undocumented and unnecessary.  It seems more
straightforward to me to only wait on the parent if the parent is
suspended.

> Who's writing the patch?
I'll write it if you still don't like this one.


> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 19:46   ` Rafael J. Wysocki
  2010-09-02 20:24     ` Colin Cross
  2010-09-02 20:24     ` Colin Cross
@ 2010-09-02 20:27     ` Alan Stern
  2010-09-02 20:27     ` Alan Stern
  3 siblings, 0 replies; 55+ messages in thread
From: Alan Stern @ 2010-09-02 20:27 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Colin Cross, linux-kernel, linux-pm, Pavel Machek, Len Brown,
	Greg Kroah-Hartman, Randy Dunlap, Andrew Morton

On Thu, 2 Sep 2010, Rafael J. Wysocki wrote:

> On Thursday, September 02, 2010, Alan Stern wrote:
> > On Wed, 1 Sep 2010, Colin Cross wrote:
> > 
> > > Only wait on a parent device during resume if the parent device is
> > > suspended.

> > I think it would be better to change device_pm_init() and add a 
> > complete_all().
> 
> I agree.
> 
> Who's writing the patch?

I was hoping that Colin would.  But I can do it if he doesn't want to.

Alan Stern


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 19:46   ` Rafael J. Wysocki
                       ` (2 preceding siblings ...)
  2010-09-02 20:27     ` Alan Stern
@ 2010-09-02 20:27     ` Alan Stern
  3 siblings, 0 replies; 55+ messages in thread
From: Alan Stern @ 2010-09-02 20:27 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Randy Dunlap, Len Brown, Greg Kroah-Hartman, linux-kernel,
	Colin Cross, linux-pm, Andrew Morton

On Thu, 2 Sep 2010, Rafael J. Wysocki wrote:

> On Thursday, September 02, 2010, Alan Stern wrote:
> > On Wed, 1 Sep 2010, Colin Cross wrote:
> > 
> > > Only wait on a parent device during resume if the parent device is
> > > suspended.

> > I think it would be better to change device_pm_init() and add a 
> > complete_all().
> 
> I agree.
> 
> Who's writing the patch?

I was hoping that Colin would.  But I can do it if he doesn't want to.

Alan Stern

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 20:24     ` Colin Cross
@ 2010-09-02 20:30       ` Rafael J. Wysocki
  2010-09-02 20:30       ` Rafael J. Wysocki
                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2010-09-02 20:30 UTC (permalink / raw)
  To: Colin Cross
  Cc: Alan Stern, linux-kernel, linux-pm, Pavel Machek, Len Brown,
	Greg Kroah-Hartman, Randy Dunlap, Andrew Morton

On Thursday, September 02, 2010, Colin Cross wrote:
> On Thu, Sep 2, 2010 at 12:46 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Thursday, September 02, 2010, Alan Stern wrote:
> >> On Wed, 1 Sep 2010, Colin Cross wrote:
> >>
> >> > Only wait on a parent device during resume if the parent device is
> >> > suspended.
> >> >
> >> > Consider three drivers, A, B, and C.  The parent of A is C, and C
> >> > has async_suspend set.  On boot, C->power.completion is initialized
> >> > to 0.
> >> >
> >> > During the first suspend:
> >> > suspend_devices_and_enter(...)
> >> >  dpm_resume(...)
> >> >   device_suspend(A)
> >> >   device_suspend(B) returns error, aborts suspend
> >> >  dpm_resume_end(...)
> >> >    dpm_resume(...)
> >> >     device_resume(A)
> >> >      dpm_wait(A->parent == C)
> >> >       wait_for_completion(C->power.completion)
> >> >
> >> > The wait_for_completion will never complete, because
> >> > complete_all(C->power.completion) will only be called from
> >> > device_suspend(C) or device_resume(C), neither of which is called
> >> > if suspend is aborted before C.
> >>
> >> This would work okay if C->power.completion had been initialized to the
> >> completed state during boot, right?
> >>
> >> > After a successful suspend->resume cycle, where B doesn't abort
> >> > suspend, C->power.completion is left in the completed state by the
> >> > call to device_resume(C), and the same call path will work if B
> >> > aborts suspend.
> >> >
> >> > Signed-off-by: Colin Cross <ccross@android.com>
> >> > ---
> >> >  drivers/base/power/main.c |    3 ++-
> >> >  1 files changed, 2 insertions(+), 1 deletions(-)
> >> >
> >> > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> >> > index cb784a0..e159910 100644
> >> > --- a/drivers/base/power/main.c
> >> > +++ b/drivers/base/power/main.c
> >> > @@ -526,7 +526,8 @@ static int device_resume(struct device *dev, pm_message_t state, bool async)
> >> >     TRACE_DEVICE(dev);
> >> >     TRACE_RESUME(0);
> >> >
> >> > -   dpm_wait(dev->parent, async);
> >> > +   if (dev->parent && dev->parent->power.status >= DPM_OFF)
> >> > +           dpm_wait(dev->parent, async);
> >> >     device_lock(dev);
> >> >
> >> >     dev->power.status = DPM_RESUMING;
> >>
> >> I think it would be better to change device_pm_init() and add a
> >> complete_all().
> >
> > I agree.
> That would work, and was my first solution, but it increases the
> reliance on the completion variable being left completed between state
> transitions, which is undocumented and unnecessary.  It seems more
> straightforward to me to only wait on the parent if the parent is
> suspended.
> 
> > Who's writing the patch?
> I'll write it if you still don't like this one.

Well, I need to think about that a little bit more.

Rafael

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 20:24     ` Colin Cross
  2010-09-02 20:30       ` Rafael J. Wysocki
@ 2010-09-02 20:30       ` Rafael J. Wysocki
  2010-09-02 20:45       ` Alan Stern
                         ` (2 subsequent siblings)
  4 siblings, 0 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2010-09-02 20:30 UTC (permalink / raw)
  To: Colin Cross
  Cc: Randy Dunlap, Len Brown, Greg Kroah-Hartman, linux-kernel,
	linux-pm, Andrew Morton

On Thursday, September 02, 2010, Colin Cross wrote:
> On Thu, Sep 2, 2010 at 12:46 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Thursday, September 02, 2010, Alan Stern wrote:
> >> On Wed, 1 Sep 2010, Colin Cross wrote:
> >>
> >> > Only wait on a parent device during resume if the parent device is
> >> > suspended.
> >> >
> >> > Consider three drivers, A, B, and C.  The parent of A is C, and C
> >> > has async_suspend set.  On boot, C->power.completion is initialized
> >> > to 0.
> >> >
> >> > During the first suspend:
> >> > suspend_devices_and_enter(...)
> >> >  dpm_resume(...)
> >> >   device_suspend(A)
> >> >   device_suspend(B) returns error, aborts suspend
> >> >  dpm_resume_end(...)
> >> >    dpm_resume(...)
> >> >     device_resume(A)
> >> >      dpm_wait(A->parent == C)
> >> >       wait_for_completion(C->power.completion)
> >> >
> >> > The wait_for_completion will never complete, because
> >> > complete_all(C->power.completion) will only be called from
> >> > device_suspend(C) or device_resume(C), neither of which is called
> >> > if suspend is aborted before C.
> >>
> >> This would work okay if C->power.completion had been initialized to the
> >> completed state during boot, right?
> >>
> >> > After a successful suspend->resume cycle, where B doesn't abort
> >> > suspend, C->power.completion is left in the completed state by the
> >> > call to device_resume(C), and the same call path will work if B
> >> > aborts suspend.
> >> >
> >> > Signed-off-by: Colin Cross <ccross@android.com>
> >> > ---
> >> >  drivers/base/power/main.c |    3 ++-
> >> >  1 files changed, 2 insertions(+), 1 deletions(-)
> >> >
> >> > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> >> > index cb784a0..e159910 100644
> >> > --- a/drivers/base/power/main.c
> >> > +++ b/drivers/base/power/main.c
> >> > @@ -526,7 +526,8 @@ static int device_resume(struct device *dev, pm_message_t state, bool async)
> >> >     TRACE_DEVICE(dev);
> >> >     TRACE_RESUME(0);
> >> >
> >> > -   dpm_wait(dev->parent, async);
> >> > +   if (dev->parent && dev->parent->power.status >= DPM_OFF)
> >> > +           dpm_wait(dev->parent, async);
> >> >     device_lock(dev);
> >> >
> >> >     dev->power.status = DPM_RESUMING;
> >>
> >> I think it would be better to change device_pm_init() and add a
> >> complete_all().
> >
> > I agree.
> That would work, and was my first solution, but it increases the
> reliance on the completion variable being left completed between state
> transitions, which is undocumented and unnecessary.  It seems more
> straightforward to me to only wait on the parent if the parent is
> suspended.
> 
> > Who's writing the patch?
> I'll write it if you still don't like this one.

Well, I need to think about that a little bit more.

Rafael

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 20:24     ` Colin Cross
                         ` (2 preceding siblings ...)
  2010-09-02 20:45       ` Alan Stern
@ 2010-09-02 20:45       ` Alan Stern
  2010-09-02 21:01         ` Colin Cross
  2010-09-02 21:01         ` Colin Cross
  2010-09-02 21:05         ` Rafael J. Wysocki
  4 siblings, 2 replies; 55+ messages in thread
From: Alan Stern @ 2010-09-02 20:45 UTC (permalink / raw)
  To: Colin Cross
  Cc: Rafael J. Wysocki, linux-kernel, linux-pm, Pavel Machek,
	Len Brown, Greg Kroah-Hartman, Randy Dunlap, Andrew Morton

On Thu, 2 Sep 2010, Colin Cross wrote:

> >> I think it would be better to change device_pm_init() and add a
> >> complete_all().
> >
> > I agree.
> That would work, and was my first solution, but it increases the
> reliance on the completion variable being left completed between state
> transitions, which is undocumented and unnecessary.  It seems more
> straightforward to me to only wait on the parent if the parent is
> suspended.

How about calling complete_all() from within dpm_prepare() as well?  
Then it will get initialized properly at the beginning of every sleep 
transition.

Alan Stern


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 20:24     ` Colin Cross
  2010-09-02 20:30       ` Rafael J. Wysocki
  2010-09-02 20:30       ` Rafael J. Wysocki
@ 2010-09-02 20:45       ` Alan Stern
  2010-09-02 20:45       ` Alan Stern
  2010-09-02 21:05         ` Rafael J. Wysocki
  4 siblings, 0 replies; 55+ messages in thread
From: Alan Stern @ 2010-09-02 20:45 UTC (permalink / raw)
  To: Colin Cross
  Cc: Randy Dunlap, Len Brown, Greg Kroah-Hartman, linux-kernel,
	linux-pm, Andrew Morton

On Thu, 2 Sep 2010, Colin Cross wrote:

> >> I think it would be better to change device_pm_init() and add a
> >> complete_all().
> >
> > I agree.
> That would work, and was my first solution, but it increases the
> reliance on the completion variable being left completed between state
> transitions, which is undocumented and unnecessary.  It seems more
> straightforward to me to only wait on the parent if the parent is
> suspended.

How about calling complete_all() from within dpm_prepare() as well?  
Then it will get initialized properly at the beginning of every sleep 
transition.

Alan Stern

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 20:45       ` Alan Stern
  2010-09-02 21:01         ` Colin Cross
@ 2010-09-02 21:01         ` Colin Cross
  2010-09-02 21:06             ` Rafael J. Wysocki
                             ` (2 more replies)
  1 sibling, 3 replies; 55+ messages in thread
From: Colin Cross @ 2010-09-02 21:01 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, linux-kernel, linux-pm, Pavel Machek,
	Len Brown, Greg Kroah-Hartman, Randy Dunlap, Andrew Morton

On Thu, Sep 2, 2010 at 1:45 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Thu, 2 Sep 2010, Colin Cross wrote:
>
>> >> I think it would be better to change device_pm_init() and add a
>> >> complete_all().
>> >
>> > I agree.
>> That would work, and was my first solution, but it increases the
>> reliance on the completion variable being left completed between state
>> transitions, which is undocumented and unnecessary.  It seems more
>> straightforward to me to only wait on the parent if the parent is
>> suspended.
>
> How about calling complete_all() from within dpm_prepare() as well?
> Then it will get initialized properly at the beginning of every sleep
> transition.
That would work, but I still don't see why it's better.  With either
of your changes, the power.completion variable is storing state, and
not just used for notification.  However, the exact meaning of that
state is unclear, especially during the transition from an aborted
suspend to resume, and the state is duplicating power.status.  Setting
it to complete in dpm_prepare is especially confusing, because at that
point nothing is completed, it hasn't even been started.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 20:45       ` Alan Stern
@ 2010-09-02 21:01         ` Colin Cross
  2010-09-02 21:01         ` Colin Cross
  1 sibling, 0 replies; 55+ messages in thread
From: Colin Cross @ 2010-09-02 21:01 UTC (permalink / raw)
  To: Alan Stern
  Cc: Randy Dunlap, Len Brown, Greg Kroah-Hartman, linux-kernel,
	linux-pm, Andrew Morton

On Thu, Sep 2, 2010 at 1:45 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Thu, 2 Sep 2010, Colin Cross wrote:
>
>> >> I think it would be better to change device_pm_init() and add a
>> >> complete_all().
>> >
>> > I agree.
>> That would work, and was my first solution, but it increases the
>> reliance on the completion variable being left completed between state
>> transitions, which is undocumented and unnecessary.  It seems more
>> straightforward to me to only wait on the parent if the parent is
>> suspended.
>
> How about calling complete_all() from within dpm_prepare() as well?
> Then it will get initialized properly at the beginning of every sleep
> transition.
That would work, but I still don't see why it's better.  With either
of your changes, the power.completion variable is storing state, and
not just used for notification.  However, the exact meaning of that
state is unclear, especially during the transition from an aborted
suspend to resume, and the state is duplicating power.status.  Setting
it to complete in dpm_prepare is especially confusing, because at that
point nothing is completed, it hasn't even been started.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 20:24     ` Colin Cross
@ 2010-09-02 21:05         ` Rafael J. Wysocki
  2010-09-02 20:30       ` Rafael J. Wysocki
                           ` (3 subsequent siblings)
  4 siblings, 0 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2010-09-02 21:05 UTC (permalink / raw)
  To: Colin Cross
  Cc: Alan Stern, linux-kernel, linux-pm, Pavel Machek, Len Brown,
	Greg Kroah-Hartman, Randy Dunlap, Andrew Morton

On Thursday, September 02, 2010, Colin Cross wrote:
> On Thu, Sep 2, 2010 at 12:46 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Thursday, September 02, 2010, Alan Stern wrote:
> >> On Wed, 1 Sep 2010, Colin Cross wrote:
> >>
> >> > Only wait on a parent device during resume if the parent device is
> >> > suspended.
> >> >
> >> > Consider three drivers, A, B, and C.  The parent of A is C, and C
> >> > has async_suspend set.  On boot, C->power.completion is initialized
> >> > to 0.
> >> >
> >> > During the first suspend:
> >> > suspend_devices_and_enter(...)
> >> >  dpm_resume(...)
> >> >   device_suspend(A)
> >> >   device_suspend(B) returns error, aborts suspend
> >> >  dpm_resume_end(...)
> >> >    dpm_resume(...)
> >> >     device_resume(A)
> >> >      dpm_wait(A->parent == C)
> >> >       wait_for_completion(C->power.completion)
> >> >
> >> > The wait_for_completion will never complete, because
> >> > complete_all(C->power.completion) will only be called from
> >> > device_suspend(C) or device_resume(C), neither of which is called
> >> > if suspend is aborted before C.
> >>
> >> This would work okay if C->power.completion had been initialized to the
> >> completed state during boot, right?
> >>
> >> > After a successful suspend->resume cycle, where B doesn't abort
> >> > suspend, C->power.completion is left in the completed state by the
> >> > call to device_resume(C), and the same call path will work if B
> >> > aborts suspend.
> >> >
> >> > Signed-off-by: Colin Cross <ccross@android.com>
> >> > ---
> >> >  drivers/base/power/main.c |    3 ++-
> >> >  1 files changed, 2 insertions(+), 1 deletions(-)
> >> >
> >> > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> >> > index cb784a0..e159910 100644
> >> > --- a/drivers/base/power/main.c
> >> > +++ b/drivers/base/power/main.c
> >> > @@ -526,7 +526,8 @@ static int device_resume(struct device *dev, pm_message_t state, bool async)
> >> >     TRACE_DEVICE(dev);
> >> >     TRACE_RESUME(0);
> >> >
> >> > -   dpm_wait(dev->parent, async);
> >> > +   if (dev->parent && dev->parent->power.status >= DPM_OFF)
> >> > +           dpm_wait(dev->parent, async);
> >> >     device_lock(dev);
> >> >
> >> >     dev->power.status = DPM_RESUMING;
> >>
> >> I think it would be better to change device_pm_init() and add a
> >> complete_all().
> >
> > I agree.
> That would work, and was my first solution, but it increases the
> reliance on the completion variable being left completed between state
> transitions, which is undocumented and unnecessary.

In fact it is necessary, because dpm_wait() may be called by external code
through device_pm_wait_for_dev() which is exported for a reason.  That may
lead to problems analogous to the one you described if the completion
variables are not completed initially.

> It seems more straightforward to me to only wait on the parent if the parent is
> suspended.
> 
> > Who's writing the patch?
> I'll write it if you still don't like this one.

Yes, please.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
@ 2010-09-02 21:05         ` Rafael J. Wysocki
  0 siblings, 0 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2010-09-02 21:05 UTC (permalink / raw)
  To: Colin Cross
  Cc: Randy Dunlap, Len Brown, Greg Kroah-Hartman, linux-kernel,
	linux-pm, Andrew Morton

On Thursday, September 02, 2010, Colin Cross wrote:
> On Thu, Sep 2, 2010 at 12:46 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Thursday, September 02, 2010, Alan Stern wrote:
> >> On Wed, 1 Sep 2010, Colin Cross wrote:
> >>
> >> > Only wait on a parent device during resume if the parent device is
> >> > suspended.
> >> >
> >> > Consider three drivers, A, B, and C.  The parent of A is C, and C
> >> > has async_suspend set.  On boot, C->power.completion is initialized
> >> > to 0.
> >> >
> >> > During the first suspend:
> >> > suspend_devices_and_enter(...)
> >> >  dpm_resume(...)
> >> >   device_suspend(A)
> >> >   device_suspend(B) returns error, aborts suspend
> >> >  dpm_resume_end(...)
> >> >    dpm_resume(...)
> >> >     device_resume(A)
> >> >      dpm_wait(A->parent == C)
> >> >       wait_for_completion(C->power.completion)
> >> >
> >> > The wait_for_completion will never complete, because
> >> > complete_all(C->power.completion) will only be called from
> >> > device_suspend(C) or device_resume(C), neither of which is called
> >> > if suspend is aborted before C.
> >>
> >> This would work okay if C->power.completion had been initialized to the
> >> completed state during boot, right?
> >>
> >> > After a successful suspend->resume cycle, where B doesn't abort
> >> > suspend, C->power.completion is left in the completed state by the
> >> > call to device_resume(C), and the same call path will work if B
> >> > aborts suspend.
> >> >
> >> > Signed-off-by: Colin Cross <ccross@android.com>
> >> > ---
> >> >  drivers/base/power/main.c |    3 ++-
> >> >  1 files changed, 2 insertions(+), 1 deletions(-)
> >> >
> >> > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> >> > index cb784a0..e159910 100644
> >> > --- a/drivers/base/power/main.c
> >> > +++ b/drivers/base/power/main.c
> >> > @@ -526,7 +526,8 @@ static int device_resume(struct device *dev, pm_message_t state, bool async)
> >> >     TRACE_DEVICE(dev);
> >> >     TRACE_RESUME(0);
> >> >
> >> > -   dpm_wait(dev->parent, async);
> >> > +   if (dev->parent && dev->parent->power.status >= DPM_OFF)
> >> > +           dpm_wait(dev->parent, async);
> >> >     device_lock(dev);
> >> >
> >> >     dev->power.status = DPM_RESUMING;
> >>
> >> I think it would be better to change device_pm_init() and add a
> >> complete_all().
> >
> > I agree.
> That would work, and was my first solution, but it increases the
> reliance on the completion variable being left completed between state
> transitions, which is undocumented and unnecessary.

In fact it is necessary, because dpm_wait() may be called by external code
through device_pm_wait_for_dev() which is exported for a reason.  That may
lead to problems analogous to the one you described if the completion
variables are not completed initially.

> It seems more straightforward to me to only wait on the parent if the parent is
> suspended.
> 
> > Who's writing the patch?
> I'll write it if you still don't like this one.

Yes, please.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 21:01         ` Colin Cross
@ 2010-09-02 21:06             ` Rafael J. Wysocki
  2010-09-02 21:34           ` Alan Stern
  2010-09-02 21:34           ` Alan Stern
  2 siblings, 0 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2010-09-02 21:06 UTC (permalink / raw)
  To: Colin Cross
  Cc: Alan Stern, linux-kernel, linux-pm, Pavel Machek, Len Brown,
	Greg Kroah-Hartman, Randy Dunlap, Andrew Morton

On Thursday, September 02, 2010, Colin Cross wrote:
> On Thu, Sep 2, 2010 at 1:45 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> > On Thu, 2 Sep 2010, Colin Cross wrote:
> >
> >> >> I think it would be better to change device_pm_init() and add a
> >> >> complete_all().
> >> >
> >> > I agree.
> >> That would work, and was my first solution, but it increases the
> >> reliance on the completion variable being left completed between state
> >> transitions, which is undocumented and unnecessary.  It seems more
> >> straightforward to me to only wait on the parent if the parent is
> >> suspended.
> >
> > How about calling complete_all() from within dpm_prepare() as well?
> > Then it will get initialized properly at the beginning of every sleep
> > transition.
> That would work, but I still don't see why it's better.  With either
> of your changes, the power.completion variable is storing state, and
> not just used for notification.  However, the exact meaning of that
> state is unclear, especially during the transition from an aborted
> suspend to resume, and the state is duplicating power.status.  Setting
> it to complete in dpm_prepare is especially confusing, because at that
> point nothing is completed, it hasn't even been started.

It just sets the initial value.  But I agree it would be cleaner to do that
during the initialization.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
@ 2010-09-02 21:06             ` Rafael J. Wysocki
  0 siblings, 0 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2010-09-02 21:06 UTC (permalink / raw)
  To: Colin Cross
  Cc: Randy Dunlap, Len Brown, Greg Kroah-Hartman, linux-kernel,
	linux-pm, Andrew Morton

On Thursday, September 02, 2010, Colin Cross wrote:
> On Thu, Sep 2, 2010 at 1:45 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> > On Thu, 2 Sep 2010, Colin Cross wrote:
> >
> >> >> I think it would be better to change device_pm_init() and add a
> >> >> complete_all().
> >> >
> >> > I agree.
> >> That would work, and was my first solution, but it increases the
> >> reliance on the completion variable being left completed between state
> >> transitions, which is undocumented and unnecessary.  It seems more
> >> straightforward to me to only wait on the parent if the parent is
> >> suspended.
> >
> > How about calling complete_all() from within dpm_prepare() as well?
> > Then it will get initialized properly at the beginning of every sleep
> > transition.
> That would work, but I still don't see why it's better.  With either
> of your changes, the power.completion variable is storing state, and
> not just used for notification.  However, the exact meaning of that
> state is unclear, especially during the transition from an aborted
> suspend to resume, and the state is duplicating power.status.  Setting
> it to complete in dpm_prepare is especially confusing, because at that
> point nothing is completed, it hasn't even been started.

It just sets the initial value.  But I agree it would be cleaner to do that
during the initialization.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 21:05         ` Rafael J. Wysocki
  (?)
@ 2010-09-02 21:31         ` Colin Cross
  2010-09-02 21:40           ` Rafael J. Wysocki
  2010-09-02 21:40           ` [PATCH] " Rafael J. Wysocki
  -1 siblings, 2 replies; 55+ messages in thread
From: Colin Cross @ 2010-09-02 21:31 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, linux-kernel, linux-pm, Pavel Machek, Len Brown,
	Greg Kroah-Hartman, Randy Dunlap, Andrew Morton

On Thu, Sep 2, 2010 at 2:05 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Thursday, September 02, 2010, Colin Cross wrote:
>> On Thu, Sep 2, 2010 at 12:46 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>> > On Thursday, September 02, 2010, Alan Stern wrote:
>> >> On Wed, 1 Sep 2010, Colin Cross wrote:
>> >>
>> >> > Only wait on a parent device during resume if the parent device is
>> >> > suspended.
>> >> >
>> >> > Consider three drivers, A, B, and C.  The parent of A is C, and C
>> >> > has async_suspend set.  On boot, C->power.completion is initialized
>> >> > to 0.
>> >> >
>> >> > During the first suspend:
>> >> > suspend_devices_and_enter(...)
>> >> >  dpm_resume(...)
>> >> >   device_suspend(A)
>> >> >   device_suspend(B) returns error, aborts suspend
>> >> >  dpm_resume_end(...)
>> >> >    dpm_resume(...)
>> >> >     device_resume(A)
>> >> >      dpm_wait(A->parent == C)
>> >> >       wait_for_completion(C->power.completion)
>> >> >
>> >> > The wait_for_completion will never complete, because
>> >> > complete_all(C->power.completion) will only be called from
>> >> > device_suspend(C) or device_resume(C), neither of which is called
>> >> > if suspend is aborted before C.
>> >>
>> >> This would work okay if C->power.completion had been initialized to the
>> >> completed state during boot, right?
>> >>
>> >> > After a successful suspend->resume cycle, where B doesn't abort
>> >> > suspend, C->power.completion is left in the completed state by the
>> >> > call to device_resume(C), and the same call path will work if B
>> >> > aborts suspend.
>> >> >
>> >> > Signed-off-by: Colin Cross <ccross@android.com>
>> >> > ---
>> >> >  drivers/base/power/main.c |    3 ++-
>> >> >  1 files changed, 2 insertions(+), 1 deletions(-)
>> >> >
>> >> > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
>> >> > index cb784a0..e159910 100644
>> >> > --- a/drivers/base/power/main.c
>> >> > +++ b/drivers/base/power/main.c
>> >> > @@ -526,7 +526,8 @@ static int device_resume(struct device *dev, pm_message_t state, bool async)
>> >> >     TRACE_DEVICE(dev);
>> >> >     TRACE_RESUME(0);
>> >> >
>> >> > -   dpm_wait(dev->parent, async);
>> >> > +   if (dev->parent && dev->parent->power.status >= DPM_OFF)
>> >> > +           dpm_wait(dev->parent, async);
>> >> >     device_lock(dev);
>> >> >
>> >> >     dev->power.status = DPM_RESUMING;
>> >>
>> >> I think it would be better to change device_pm_init() and add a
>> >> complete_all().
>> >
>> > I agree.
>> That would work, and was my first solution, but it increases the
>> reliance on the completion variable being left completed between state
>> transitions, which is undocumented and unnecessary.
>
> In fact it is necessary, because dpm_wait() may be called by external code
> through device_pm_wait_for_dev() which is exported for a reason.  That may
> lead to problems analogous to the one you described if the completion
> variables are not completed initially.
>
>> It seems more straightforward to me to only wait on the parent if the parent is
>> suspended.
>>
>> > Who's writing the patch?
>> I'll write it if you still don't like this one.
>
> Yes, please.
OK - do you prefer it in dpm_prepare or device_pm_init?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 21:05         ` Rafael J. Wysocki
  (?)
  (?)
@ 2010-09-02 21:31         ` Colin Cross
  -1 siblings, 0 replies; 55+ messages in thread
From: Colin Cross @ 2010-09-02 21:31 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Randy Dunlap, Len Brown, Greg Kroah-Hartman, linux-kernel,
	linux-pm, Andrew Morton

On Thu, Sep 2, 2010 at 2:05 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Thursday, September 02, 2010, Colin Cross wrote:
>> On Thu, Sep 2, 2010 at 12:46 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>> > On Thursday, September 02, 2010, Alan Stern wrote:
>> >> On Wed, 1 Sep 2010, Colin Cross wrote:
>> >>
>> >> > Only wait on a parent device during resume if the parent device is
>> >> > suspended.
>> >> >
>> >> > Consider three drivers, A, B, and C.  The parent of A is C, and C
>> >> > has async_suspend set.  On boot, C->power.completion is initialized
>> >> > to 0.
>> >> >
>> >> > During the first suspend:
>> >> > suspend_devices_and_enter(...)
>> >> >  dpm_resume(...)
>> >> >   device_suspend(A)
>> >> >   device_suspend(B) returns error, aborts suspend
>> >> >  dpm_resume_end(...)
>> >> >    dpm_resume(...)
>> >> >     device_resume(A)
>> >> >      dpm_wait(A->parent == C)
>> >> >       wait_for_completion(C->power.completion)
>> >> >
>> >> > The wait_for_completion will never complete, because
>> >> > complete_all(C->power.completion) will only be called from
>> >> > device_suspend(C) or device_resume(C), neither of which is called
>> >> > if suspend is aborted before C.
>> >>
>> >> This would work okay if C->power.completion had been initialized to the
>> >> completed state during boot, right?
>> >>
>> >> > After a successful suspend->resume cycle, where B doesn't abort
>> >> > suspend, C->power.completion is left in the completed state by the
>> >> > call to device_resume(C), and the same call path will work if B
>> >> > aborts suspend.
>> >> >
>> >> > Signed-off-by: Colin Cross <ccross@android.com>
>> >> > ---
>> >> >  drivers/base/power/main.c |    3 ++-
>> >> >  1 files changed, 2 insertions(+), 1 deletions(-)
>> >> >
>> >> > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
>> >> > index cb784a0..e159910 100644
>> >> > --- a/drivers/base/power/main.c
>> >> > +++ b/drivers/base/power/main.c
>> >> > @@ -526,7 +526,8 @@ static int device_resume(struct device *dev, pm_message_t state, bool async)
>> >> >     TRACE_DEVICE(dev);
>> >> >     TRACE_RESUME(0);
>> >> >
>> >> > -   dpm_wait(dev->parent, async);
>> >> > +   if (dev->parent && dev->parent->power.status >= DPM_OFF)
>> >> > +           dpm_wait(dev->parent, async);
>> >> >     device_lock(dev);
>> >> >
>> >> >     dev->power.status = DPM_RESUMING;
>> >>
>> >> I think it would be better to change device_pm_init() and add a
>> >> complete_all().
>> >
>> > I agree.
>> That would work, and was my first solution, but it increases the
>> reliance on the completion variable being left completed between state
>> transitions, which is undocumented and unnecessary.
>
> In fact it is necessary, because dpm_wait() may be called by external code
> through device_pm_wait_for_dev() which is exported for a reason.  That may
> lead to problems analogous to the one you described if the completion
> variables are not completed initially.
>
>> It seems more straightforward to me to only wait on the parent if the parent is
>> suspended.
>>
>> > Who's writing the patch?
>> I'll write it if you still don't like this one.
>
> Yes, please.
OK - do you prefer it in dpm_prepare or device_pm_init?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 21:01         ` Colin Cross
  2010-09-02 21:06             ` Rafael J. Wysocki
@ 2010-09-02 21:34           ` Alan Stern
  2010-09-02 22:45             ` Colin Cross
  2010-09-02 22:45             ` Colin Cross
  2010-09-02 21:34           ` Alan Stern
  2 siblings, 2 replies; 55+ messages in thread
From: Alan Stern @ 2010-09-02 21:34 UTC (permalink / raw)
  To: Colin Cross
  Cc: Rafael J. Wysocki, linux-kernel, linux-pm, Pavel Machek,
	Len Brown, Greg Kroah-Hartman, Randy Dunlap, Andrew Morton

On Thu, 2 Sep 2010, Colin Cross wrote:

> That would work, but I still don't see why it's better.  With either
> of your changes, the power.completion variable is storing state, and
> not just used for notification.  However, the exact meaning of that
> state is unclear, especially during the transition from an aborted
> suspend to resume, and the state is duplicating power.status.  Setting
> it to complete in dpm_prepare is especially confusing, because at that
> point nothing is completed, it hasn't even been started.

The state being waited for varies from time to time and is only
partially related to power.status.  Instead of using a completion I
suppose we could have used a new "transition_complete" variable
together with a waitqueue.  Would you prefer that?  It's effectively
the same thing as a completion, but without the nice packaging already 
provided by the kernel.

Alan Stern


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 21:01         ` Colin Cross
  2010-09-02 21:06             ` Rafael J. Wysocki
  2010-09-02 21:34           ` Alan Stern
@ 2010-09-02 21:34           ` Alan Stern
  2 siblings, 0 replies; 55+ messages in thread
From: Alan Stern @ 2010-09-02 21:34 UTC (permalink / raw)
  To: Colin Cross
  Cc: Randy Dunlap, Len Brown, Greg Kroah-Hartman, linux-kernel,
	linux-pm, Andrew Morton

On Thu, 2 Sep 2010, Colin Cross wrote:

> That would work, but I still don't see why it's better.  With either
> of your changes, the power.completion variable is storing state, and
> not just used for notification.  However, the exact meaning of that
> state is unclear, especially during the transition from an aborted
> suspend to resume, and the state is duplicating power.status.  Setting
> it to complete in dpm_prepare is especially confusing, because at that
> point nothing is completed, it hasn't even been started.

The state being waited for varies from time to time and is only
partially related to power.status.  Instead of using a completion I
suppose we could have used a new "transition_complete" variable
together with a waitqueue.  Would you prefer that?  It's effectively
the same thing as a completion, but without the nice packaging already 
provided by the kernel.

Alan Stern

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 21:31         ` Colin Cross
@ 2010-09-02 21:40           ` Rafael J. Wysocki
  2010-09-02 22:59             ` [PATCH v2] " Colin Cross
  2010-09-02 22:59             ` Colin Cross
  2010-09-02 21:40           ` [PATCH] " Rafael J. Wysocki
  1 sibling, 2 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2010-09-02 21:40 UTC (permalink / raw)
  To: Colin Cross
  Cc: Alan Stern, linux-kernel, linux-pm, Pavel Machek, Len Brown,
	Greg Kroah-Hartman, Randy Dunlap, Andrew Morton

On Thursday, September 02, 2010, Colin Cross wrote:
> On Thu, Sep 2, 2010 at 2:05 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Thursday, September 02, 2010, Colin Cross wrote:
> >> On Thu, Sep 2, 2010 at 12:46 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >> > On Thursday, September 02, 2010, Alan Stern wrote:
> >> >> On Wed, 1 Sep 2010, Colin Cross wrote:
> >> >>
> >> >> > Only wait on a parent device during resume if the parent device is
> >> >> > suspended.
> >> >> >
> >> >> > Consider three drivers, A, B, and C.  The parent of A is C, and C
> >> >> > has async_suspend set.  On boot, C->power.completion is initialized
> >> >> > to 0.
> >> >> >
> >> >> > During the first suspend:
> >> >> > suspend_devices_and_enter(...)
> >> >> >  dpm_resume(...)
> >> >> >   device_suspend(A)
> >> >> >   device_suspend(B) returns error, aborts suspend
> >> >> >  dpm_resume_end(...)
> >> >> >    dpm_resume(...)
> >> >> >     device_resume(A)
> >> >> >      dpm_wait(A->parent == C)
> >> >> >       wait_for_completion(C->power.completion)
> >> >> >
> >> >> > The wait_for_completion will never complete, because
> >> >> > complete_all(C->power.completion) will only be called from
> >> >> > device_suspend(C) or device_resume(C), neither of which is called
> >> >> > if suspend is aborted before C.
> >> >>
> >> >> This would work okay if C->power.completion had been initialized to the
> >> >> completed state during boot, right?
> >> >>
> >> >> > After a successful suspend->resume cycle, where B doesn't abort
> >> >> > suspend, C->power.completion is left in the completed state by the
> >> >> > call to device_resume(C), and the same call path will work if B
> >> >> > aborts suspend.
> >> >> >
> >> >> > Signed-off-by: Colin Cross <ccross@android.com>
> >> >> > ---
> >> >> >  drivers/base/power/main.c |    3 ++-
> >> >> >  1 files changed, 2 insertions(+), 1 deletions(-)
> >> >> >
> >> >> > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> >> >> > index cb784a0..e159910 100644
> >> >> > --- a/drivers/base/power/main.c
> >> >> > +++ b/drivers/base/power/main.c
> >> >> > @@ -526,7 +526,8 @@ static int device_resume(struct device *dev, pm_message_t state, bool async)
> >> >> >     TRACE_DEVICE(dev);
> >> >> >     TRACE_RESUME(0);
> >> >> >
> >> >> > -   dpm_wait(dev->parent, async);
> >> >> > +   if (dev->parent && dev->parent->power.status >= DPM_OFF)
> >> >> > +           dpm_wait(dev->parent, async);
> >> >> >     device_lock(dev);
> >> >> >
> >> >> >     dev->power.status = DPM_RESUMING;
> >> >>
> >> >> I think it would be better to change device_pm_init() and add a
> >> >> complete_all().
> >> >
> >> > I agree.
> >> That would work, and was my first solution, but it increases the
> >> reliance on the completion variable being left completed between state
> >> transitions, which is undocumented and unnecessary.
> >
> > In fact it is necessary, because dpm_wait() may be called by external code
> > through device_pm_wait_for_dev() which is exported for a reason.  That may
> > lead to problems analogous to the one you described if the completion
> > variables are not completed initially.
> >
> >> It seems more straightforward to me to only wait on the parent if the parent is
> >> suspended.
> >>
> >> > Who's writing the patch?
> >> I'll write it if you still don't like this one.
> >
> > Yes, please.
> OK - do you prefer it in dpm_prepare or device_pm_init?

device_pm_init(), please.

Rafael

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 21:31         ` Colin Cross
  2010-09-02 21:40           ` Rafael J. Wysocki
@ 2010-09-02 21:40           ` Rafael J. Wysocki
  1 sibling, 0 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2010-09-02 21:40 UTC (permalink / raw)
  To: Colin Cross
  Cc: Randy Dunlap, Len Brown, Greg Kroah-Hartman, linux-kernel,
	linux-pm, Andrew Morton

On Thursday, September 02, 2010, Colin Cross wrote:
> On Thu, Sep 2, 2010 at 2:05 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Thursday, September 02, 2010, Colin Cross wrote:
> >> On Thu, Sep 2, 2010 at 12:46 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >> > On Thursday, September 02, 2010, Alan Stern wrote:
> >> >> On Wed, 1 Sep 2010, Colin Cross wrote:
> >> >>
> >> >> > Only wait on a parent device during resume if the parent device is
> >> >> > suspended.
> >> >> >
> >> >> > Consider three drivers, A, B, and C.  The parent of A is C, and C
> >> >> > has async_suspend set.  On boot, C->power.completion is initialized
> >> >> > to 0.
> >> >> >
> >> >> > During the first suspend:
> >> >> > suspend_devices_and_enter(...)
> >> >> >  dpm_resume(...)
> >> >> >   device_suspend(A)
> >> >> >   device_suspend(B) returns error, aborts suspend
> >> >> >  dpm_resume_end(...)
> >> >> >    dpm_resume(...)
> >> >> >     device_resume(A)
> >> >> >      dpm_wait(A->parent == C)
> >> >> >       wait_for_completion(C->power.completion)
> >> >> >
> >> >> > The wait_for_completion will never complete, because
> >> >> > complete_all(C->power.completion) will only be called from
> >> >> > device_suspend(C) or device_resume(C), neither of which is called
> >> >> > if suspend is aborted before C.
> >> >>
> >> >> This would work okay if C->power.completion had been initialized to the
> >> >> completed state during boot, right?
> >> >>
> >> >> > After a successful suspend->resume cycle, where B doesn't abort
> >> >> > suspend, C->power.completion is left in the completed state by the
> >> >> > call to device_resume(C), and the same call path will work if B
> >> >> > aborts suspend.
> >> >> >
> >> >> > Signed-off-by: Colin Cross <ccross@android.com>
> >> >> > ---
> >> >> >  drivers/base/power/main.c |    3 ++-
> >> >> >  1 files changed, 2 insertions(+), 1 deletions(-)
> >> >> >
> >> >> > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> >> >> > index cb784a0..e159910 100644
> >> >> > --- a/drivers/base/power/main.c
> >> >> > +++ b/drivers/base/power/main.c
> >> >> > @@ -526,7 +526,8 @@ static int device_resume(struct device *dev, pm_message_t state, bool async)
> >> >> >     TRACE_DEVICE(dev);
> >> >> >     TRACE_RESUME(0);
> >> >> >
> >> >> > -   dpm_wait(dev->parent, async);
> >> >> > +   if (dev->parent && dev->parent->power.status >= DPM_OFF)
> >> >> > +           dpm_wait(dev->parent, async);
> >> >> >     device_lock(dev);
> >> >> >
> >> >> >     dev->power.status = DPM_RESUMING;
> >> >>
> >> >> I think it would be better to change device_pm_init() and add a
> >> >> complete_all().
> >> >
> >> > I agree.
> >> That would work, and was my first solution, but it increases the
> >> reliance on the completion variable being left completed between state
> >> transitions, which is undocumented and unnecessary.
> >
> > In fact it is necessary, because dpm_wait() may be called by external code
> > through device_pm_wait_for_dev() which is exported for a reason.  That may
> > lead to problems analogous to the one you described if the completion
> > variables are not completed initially.
> >
> >> It seems more straightforward to me to only wait on the parent if the parent is
> >> suspended.
> >>
> >> > Who's writing the patch?
> >> I'll write it if you still don't like this one.
> >
> > Yes, please.
> OK - do you prefer it in dpm_prepare or device_pm_init?

device_pm_init(), please.

Rafael

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 21:34           ` Alan Stern
@ 2010-09-02 22:45             ` Colin Cross
  2010-09-02 23:09               ` Rafael J. Wysocki
  2010-09-02 23:09               ` Rafael J. Wysocki
  2010-09-02 22:45             ` Colin Cross
  1 sibling, 2 replies; 55+ messages in thread
From: Colin Cross @ 2010-09-02 22:45 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, linux-kernel, linux-pm, Pavel Machek,
	Len Brown, Greg Kroah-Hartman, Randy Dunlap, Andrew Morton

On Thu, Sep 2, 2010 at 2:34 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Thu, 2 Sep 2010, Colin Cross wrote:
>
>> That would work, but I still don't see why it's better.  With either
>> of your changes, the power.completion variable is storing state, and
>> not just used for notification.  However, the exact meaning of that
>> state is unclear, especially during the transition from an aborted
>> suspend to resume, and the state is duplicating power.status.  Setting
>> it to complete in dpm_prepare is especially confusing, because at that
>> point nothing is completed, it hasn't even been started.
>
> The state being waited for varies from time to time and is only
> partially related to power.status.  Instead of using a completion I
> suppose we could have used a new "transition_complete" variable
> together with a waitqueue.  Would you prefer that?  It's effectively
> the same thing as a completion, but without the nice packaging already
> provided by the kernel.
No, that doesn't change anything.  What I'd prefer to see is a
wait_for_condition on the desired state of the parent.  As is,
power.completion means one thing during suspend (the device has
started, but not finished, suspending), and a different thing during
resume (the device has not finished resuming, and may not have started
resuming).  That difference is exactly what caused the bug - the
completion has to be set on init so that it is set before the device
starts suspend.

I'll send the complete_all on init patch, as it's the only way to fix
the problem given the current implementation of dpm_wait.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 21:34           ` Alan Stern
  2010-09-02 22:45             ` Colin Cross
@ 2010-09-02 22:45             ` Colin Cross
  1 sibling, 0 replies; 55+ messages in thread
From: Colin Cross @ 2010-09-02 22:45 UTC (permalink / raw)
  To: Alan Stern
  Cc: Randy Dunlap, Len Brown, Greg Kroah-Hartman, linux-kernel,
	linux-pm, Andrew Morton

On Thu, Sep 2, 2010 at 2:34 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Thu, 2 Sep 2010, Colin Cross wrote:
>
>> That would work, but I still don't see why it's better.  With either
>> of your changes, the power.completion variable is storing state, and
>> not just used for notification.  However, the exact meaning of that
>> state is unclear, especially during the transition from an aborted
>> suspend to resume, and the state is duplicating power.status.  Setting
>> it to complete in dpm_prepare is especially confusing, because at that
>> point nothing is completed, it hasn't even been started.
>
> The state being waited for varies from time to time and is only
> partially related to power.status.  Instead of using a completion I
> suppose we could have used a new "transition_complete" variable
> together with a waitqueue.  Would you prefer that?  It's effectively
> the same thing as a completion, but without the nice packaging already
> provided by the kernel.
No, that doesn't change anything.  What I'd prefer to see is a
wait_for_condition on the desired state of the parent.  As is,
power.completion means one thing during suspend (the device has
started, but not finished, suspending), and a different thing during
resume (the device has not finished resuming, and may not have started
resuming).  That difference is exactly what caused the bug - the
completion has to be set on init so that it is set before the device
starts suspend.

I'll send the complete_all on init patch, as it's the only way to fix
the problem given the current implementation of dpm_wait.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 21:40           ` Rafael J. Wysocki
@ 2010-09-02 22:59             ` Colin Cross
  2010-09-02 23:25                 ` Rafael J. Wysocki
  2010-09-02 22:59             ` Colin Cross
  1 sibling, 1 reply; 55+ messages in thread
From: Colin Cross @ 2010-09-02 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-pm
  Cc: Colin Cross, Pavel Machek, Rafael J. Wysocki, Len Brown,
	Greg Kroah-Hartman, Alan Stern, Randy Dunlap

During suspend, the power.completion is expected to be set when a
device has not yet started suspending.  Set it on init to fix a
corner case where a device is resumed when its parent has never
suspended.

Consider three drivers, A, B, and C.  The parent of A is C, and C
has async_suspend set.  On boot, C->power.completion is initialized
to 0.

During the first suspend:
suspend_devices_and_enter(...)
 dpm_resume(...)
  device_suspend(A)
  device_suspend(B) returns error, aborts suspend
 dpm_resume_end(...)
   dpm_resume(...)
    device_resume(A)
     dpm_wait(A->parent == C)
      wait_for_completion(C->power.completion)

The wait_for_completion will never complete, because
complete_all(C->power.completion) will only be called from
device_suspend(C) or device_resume(C), neither of which is called
if suspend is aborted before C.

After a successful suspend->resume cycle, where B doesn't abort
suspend, C->power.completion is left in the completed state by the
call to device_resume(C), and the same call path will work if B
aborts suspend.

Signed-off-by: Colin Cross <ccross@android.com>
---
 drivers/base/power/main.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index cb784a0..b1b4029 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -63,6 +63,7 @@ void device_pm_init(struct device *dev)
 {
 	dev->power.status = DPM_ON;
 	init_completion(&dev->power.completion);
+	complete_all(&dev->power.completion);
 	pm_runtime_init(dev);
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 21:40           ` Rafael J. Wysocki
  2010-09-02 22:59             ` [PATCH v2] " Colin Cross
@ 2010-09-02 22:59             ` Colin Cross
  1 sibling, 0 replies; 55+ messages in thread
From: Colin Cross @ 2010-09-02 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-pm
  Cc: Randy Dunlap, Len Brown, Greg Kroah-Hartman, Colin Cross

During suspend, the power.completion is expected to be set when a
device has not yet started suspending.  Set it on init to fix a
corner case where a device is resumed when its parent has never
suspended.

Consider three drivers, A, B, and C.  The parent of A is C, and C
has async_suspend set.  On boot, C->power.completion is initialized
to 0.

During the first suspend:
suspend_devices_and_enter(...)
 dpm_resume(...)
  device_suspend(A)
  device_suspend(B) returns error, aborts suspend
 dpm_resume_end(...)
   dpm_resume(...)
    device_resume(A)
     dpm_wait(A->parent == C)
      wait_for_completion(C->power.completion)

The wait_for_completion will never complete, because
complete_all(C->power.completion) will only be called from
device_suspend(C) or device_resume(C), neither of which is called
if suspend is aborted before C.

After a successful suspend->resume cycle, where B doesn't abort
suspend, C->power.completion is left in the completed state by the
call to device_resume(C), and the same call path will work if B
aborts suspend.

Signed-off-by: Colin Cross <ccross@android.com>
---
 drivers/base/power/main.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index cb784a0..b1b4029 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -63,6 +63,7 @@ void device_pm_init(struct device *dev)
 {
 	dev->power.status = DPM_ON;
 	init_completion(&dev->power.completion);
+	complete_all(&dev->power.completion);
 	pm_runtime_init(dev);
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 22:45             ` Colin Cross
@ 2010-09-02 23:09               ` Rafael J. Wysocki
  2010-09-03  0:14                 ` Colin Cross
  2010-09-03  0:14                 ` Colin Cross
  2010-09-02 23:09               ` Rafael J. Wysocki
  1 sibling, 2 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2010-09-02 23:09 UTC (permalink / raw)
  To: Colin Cross
  Cc: Alan Stern, linux-kernel, linux-pm, Pavel Machek, Len Brown,
	Greg Kroah-Hartman, Randy Dunlap, Andrew Morton

On Friday, September 03, 2010, Colin Cross wrote:
> On Thu, Sep 2, 2010 at 2:34 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> > On Thu, 2 Sep 2010, Colin Cross wrote:
> >
> >> That would work, but I still don't see why it's better.  With either
> >> of your changes, the power.completion variable is storing state, and
> >> not just used for notification.  However, the exact meaning of that
> >> state is unclear, especially during the transition from an aborted
> >> suspend to resume, and the state is duplicating power.status.  Setting
> >> it to complete in dpm_prepare is especially confusing, because at that
> >> point nothing is completed, it hasn't even been started.
> >
> > The state being waited for varies from time to time and is only
> > partially related to power.status.  Instead of using a completion I
> > suppose we could have used a new "transition_complete" variable
> > together with a waitqueue.  Would you prefer that?  It's effectively
> > the same thing as a completion, but without the nice packaging already
> > provided by the kernel.
> No, that doesn't change anything.  What I'd prefer to see is a
> wait_for_condition on the desired state of the parent.  As is,
> power.completion means one thing during suspend (the device has
> started, but not finished, suspending), and a different thing during
> resume (the device has not finished resuming, and may not have started
> resuming).  That difference is exactly what caused the bug - the
> completion has to be set on init so that it is set before the device
> starts suspend.

Not really.  The bug is there, because my analysis of the suspend error code
path was wrong.  Sorry about that, but it has nothing to do with the "different
meaning" of the completions during suspend and resume.

The completions here are simply used to enforce a specific ordering of
operations, nothing more.  They have no meaning beyond that.

> I'll send the complete_all on init patch, as it's the only way to fix
> the problem given the current implementation of dpm_wait.

Great, thanks.

Rafael

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 22:45             ` Colin Cross
  2010-09-02 23:09               ` Rafael J. Wysocki
@ 2010-09-02 23:09               ` Rafael J. Wysocki
  1 sibling, 0 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2010-09-02 23:09 UTC (permalink / raw)
  To: Colin Cross
  Cc: Randy Dunlap, Len Brown, Greg Kroah-Hartman, linux-kernel,
	linux-pm, Andrew Morton

On Friday, September 03, 2010, Colin Cross wrote:
> On Thu, Sep 2, 2010 at 2:34 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> > On Thu, 2 Sep 2010, Colin Cross wrote:
> >
> >> That would work, but I still don't see why it's better.  With either
> >> of your changes, the power.completion variable is storing state, and
> >> not just used for notification.  However, the exact meaning of that
> >> state is unclear, especially during the transition from an aborted
> >> suspend to resume, and the state is duplicating power.status.  Setting
> >> it to complete in dpm_prepare is especially confusing, because at that
> >> point nothing is completed, it hasn't even been started.
> >
> > The state being waited for varies from time to time and is only
> > partially related to power.status.  Instead of using a completion I
> > suppose we could have used a new "transition_complete" variable
> > together with a waitqueue.  Would you prefer that?  It's effectively
> > the same thing as a completion, but without the nice packaging already
> > provided by the kernel.
> No, that doesn't change anything.  What I'd prefer to see is a
> wait_for_condition on the desired state of the parent.  As is,
> power.completion means one thing during suspend (the device has
> started, but not finished, suspending), and a different thing during
> resume (the device has not finished resuming, and may not have started
> resuming).  That difference is exactly what caused the bug - the
> completion has to be set on init so that it is set before the device
> starts suspend.

Not really.  The bug is there, because my analysis of the suspend error code
path was wrong.  Sorry about that, but it has nothing to do with the "different
meaning" of the completions during suspend and resume.

The completions here are simply used to enforce a specific ordering of
operations, nothing more.  They have no meaning beyond that.

> I'll send the complete_all on init patch, as it's the only way to fix
> the problem given the current implementation of dpm_wait.

Great, thanks.

Rafael

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 22:59             ` [PATCH v2] " Colin Cross
@ 2010-09-02 23:25                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2010-09-02 23:25 UTC (permalink / raw)
  To: Colin Cross
  Cc: linux-kernel, linux-pm, Pavel Machek, Len Brown,
	Greg Kroah-Hartman, Alan Stern, Randy Dunlap

On Friday, September 03, 2010, Colin Cross wrote:
> During suspend, the power.completion is expected to be set when a
> device has not yet started suspending.  Set it on init to fix a
> corner case where a device is resumed when its parent has never
> suspended.
> 
> Consider three drivers, A, B, and C.  The parent of A is C, and C
> has async_suspend set.  On boot, C->power.completion is initialized
> to 0.
> 
> During the first suspend:
> suspend_devices_and_enter(...)
>  dpm_resume(...)
>   device_suspend(A)
>   device_suspend(B) returns error, aborts suspend
>  dpm_resume_end(...)
>    dpm_resume(...)
>     device_resume(A)
>      dpm_wait(A->parent == C)
>       wait_for_completion(C->power.completion)
> 
> The wait_for_completion will never complete, because
> complete_all(C->power.completion) will only be called from
> device_suspend(C) or device_resume(C), neither of which is called
> if suspend is aborted before C.
> 
> After a successful suspend->resume cycle, where B doesn't abort
> suspend, C->power.completion is left in the completed state by the
> call to device_resume(C), and the same call path will work if B
> aborts suspend.
> 
> Signed-off-by: Colin Cross <ccross@android.com>

Thanks, rebased on top of the current Linus' tree and applied to
suspend-2.6/pm-fixes .

Rafael


> ---
>  drivers/base/power/main.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index cb784a0..b1b4029 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -63,6 +63,7 @@ void device_pm_init(struct device *dev)
>  {
>  	dev->power.status = DPM_ON;
>  	init_completion(&dev->power.completion);
> +	complete_all(&dev->power.completion);
>  	pm_runtime_init(dev);
>  }
>  
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2] PM: Prevent waiting forever on asynchronous resume after abort
@ 2010-09-02 23:25                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2010-09-02 23:25 UTC (permalink / raw)
  To: Colin Cross
  Cc: Randy Dunlap, Len Brown, Greg Kroah-Hartman, linux-kernel, linux-pm

On Friday, September 03, 2010, Colin Cross wrote:
> During suspend, the power.completion is expected to be set when a
> device has not yet started suspending.  Set it on init to fix a
> corner case where a device is resumed when its parent has never
> suspended.
> 
> Consider three drivers, A, B, and C.  The parent of A is C, and C
> has async_suspend set.  On boot, C->power.completion is initialized
> to 0.
> 
> During the first suspend:
> suspend_devices_and_enter(...)
>  dpm_resume(...)
>   device_suspend(A)
>   device_suspend(B) returns error, aborts suspend
>  dpm_resume_end(...)
>    dpm_resume(...)
>     device_resume(A)
>      dpm_wait(A->parent == C)
>       wait_for_completion(C->power.completion)
> 
> The wait_for_completion will never complete, because
> complete_all(C->power.completion) will only be called from
> device_suspend(C) or device_resume(C), neither of which is called
> if suspend is aborted before C.
> 
> After a successful suspend->resume cycle, where B doesn't abort
> suspend, C->power.completion is left in the completed state by the
> call to device_resume(C), and the same call path will work if B
> aborts suspend.
> 
> Signed-off-by: Colin Cross <ccross@android.com>

Thanks, rebased on top of the current Linus' tree and applied to
suspend-2.6/pm-fixes .

Rafael


> ---
>  drivers/base/power/main.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index cb784a0..b1b4029 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -63,6 +63,7 @@ void device_pm_init(struct device *dev)
>  {
>  	dev->power.status = DPM_ON;
>  	init_completion(&dev->power.completion);
> +	complete_all(&dev->power.completion);
>  	pm_runtime_init(dev);
>  }
>  
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 23:09               ` Rafael J. Wysocki
  2010-09-03  0:14                 ` Colin Cross
@ 2010-09-03  0:14                 ` Colin Cross
  2010-09-03  0:35                   ` Rafael J. Wysocki
  2010-09-03  0:35                   ` Rafael J. Wysocki
  1 sibling, 2 replies; 55+ messages in thread
From: Colin Cross @ 2010-09-03  0:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, linux-kernel, linux-pm, Pavel Machek, Len Brown,
	Greg Kroah-Hartman, Randy Dunlap, Andrew Morton

On Thu, Sep 2, 2010 at 4:09 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Friday, September 03, 2010, Colin Cross wrote:
>> On Thu, Sep 2, 2010 at 2:34 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
>> > On Thu, 2 Sep 2010, Colin Cross wrote:
>> >
>> >> That would work, but I still don't see why it's better.  With either
>> >> of your changes, the power.completion variable is storing state, and
>> >> not just used for notification.  However, the exact meaning of that
>> >> state is unclear, especially during the transition from an aborted
>> >> suspend to resume, and the state is duplicating power.status.  Setting
>> >> it to complete in dpm_prepare is especially confusing, because at that
>> >> point nothing is completed, it hasn't even been started.
>> >
>> > The state being waited for varies from time to time and is only
>> > partially related to power.status.  Instead of using a completion I
>> > suppose we could have used a new "transition_complete" variable
>> > together with a waitqueue.  Would you prefer that?  It's effectively
>> > the same thing as a completion, but without the nice packaging already
>> > provided by the kernel.
>> No, that doesn't change anything.  What I'd prefer to see is a
>> wait_for_condition on the desired state of the parent.  As is,
>> power.completion means one thing during suspend (the device has
>> started, but not finished, suspending), and a different thing during
>> resume (the device has not finished resuming, and may not have started
>> resuming).  That difference is exactly what caused the bug - the
>> completion has to be set on init so that it is set before the device
>> starts suspend.
>
> Not really.  The bug is there, because my analysis of the suspend error code
> path was wrong.  Sorry about that, but it has nothing to do with the "different
> meaning" of the completions during suspend and resume.
>
> The completions here are simply used to enforce a specific ordering of
> operations, nothing more.  They have no meaning beyond that.
The completion variable maintains state.  It has meaning whether or
not you want it to.  Leaving it as a completion variable requires that
you manage that state, which is difficult considering there is no
documentation and no clear idea in the code of exactly when that state
is set or clear.

It would be much cleaner to use a wait queue, and use
wait_on_condition to wait for the device to be in the desired state.

>> I'll send the complete_all on init patch, as it's the only way to fix
>> the problem given the current implementation of dpm_wait.
>
> Great, thanks.
>
> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-02 23:09               ` Rafael J. Wysocki
@ 2010-09-03  0:14                 ` Colin Cross
  2010-09-03  0:14                 ` Colin Cross
  1 sibling, 0 replies; 55+ messages in thread
From: Colin Cross @ 2010-09-03  0:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Randy Dunlap, Len Brown, Greg Kroah-Hartman, linux-kernel,
	linux-pm, Andrew Morton

On Thu, Sep 2, 2010 at 4:09 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Friday, September 03, 2010, Colin Cross wrote:
>> On Thu, Sep 2, 2010 at 2:34 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
>> > On Thu, 2 Sep 2010, Colin Cross wrote:
>> >
>> >> That would work, but I still don't see why it's better.  With either
>> >> of your changes, the power.completion variable is storing state, and
>> >> not just used for notification.  However, the exact meaning of that
>> >> state is unclear, especially during the transition from an aborted
>> >> suspend to resume, and the state is duplicating power.status.  Setting
>> >> it to complete in dpm_prepare is especially confusing, because at that
>> >> point nothing is completed, it hasn't even been started.
>> >
>> > The state being waited for varies from time to time and is only
>> > partially related to power.status.  Instead of using a completion I
>> > suppose we could have used a new "transition_complete" variable
>> > together with a waitqueue.  Would you prefer that?  It's effectively
>> > the same thing as a completion, but without the nice packaging already
>> > provided by the kernel.
>> No, that doesn't change anything.  What I'd prefer to see is a
>> wait_for_condition on the desired state of the parent.  As is,
>> power.completion means one thing during suspend (the device has
>> started, but not finished, suspending), and a different thing during
>> resume (the device has not finished resuming, and may not have started
>> resuming).  That difference is exactly what caused the bug - the
>> completion has to be set on init so that it is set before the device
>> starts suspend.
>
> Not really.  The bug is there, because my analysis of the suspend error code
> path was wrong.  Sorry about that, but it has nothing to do with the "different
> meaning" of the completions during suspend and resume.
>
> The completions here are simply used to enforce a specific ordering of
> operations, nothing more.  They have no meaning beyond that.
The completion variable maintains state.  It has meaning whether or
not you want it to.  Leaving it as a completion variable requires that
you manage that state, which is difficult considering there is no
documentation and no clear idea in the code of exactly when that state
is set or clear.

It would be much cleaner to use a wait queue, and use
wait_on_condition to wait for the device to be in the desired state.

>> I'll send the complete_all on init patch, as it's the only way to fix
>> the problem given the current implementation of dpm_wait.
>
> Great, thanks.
>
> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-03  0:14                 ` Colin Cross
  2010-09-03  0:35                   ` Rafael J. Wysocki
@ 2010-09-03  0:35                   ` Rafael J. Wysocki
  2010-09-03  1:54                     ` Colin Cross
  2010-09-03  1:54                     ` Colin Cross
  1 sibling, 2 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2010-09-03  0:35 UTC (permalink / raw)
  To: Colin Cross
  Cc: Alan Stern, linux-kernel, linux-pm, Pavel Machek, Len Brown,
	Greg Kroah-Hartman, Randy Dunlap, Andrew Morton

On Friday, September 03, 2010, Colin Cross wrote:
> On Thu, Sep 2, 2010 at 4:09 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Friday, September 03, 2010, Colin Cross wrote:
> >> On Thu, Sep 2, 2010 at 2:34 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> >> > On Thu, 2 Sep 2010, Colin Cross wrote:
> >> >
> >> >> That would work, but I still don't see why it's better.  With either
> >> >> of your changes, the power.completion variable is storing state, and
> >> >> not just used for notification.  However, the exact meaning of that
> >> >> state is unclear, especially during the transition from an aborted
> >> >> suspend to resume, and the state is duplicating power.status.  Setting
> >> >> it to complete in dpm_prepare is especially confusing, because at that
> >> >> point nothing is completed, it hasn't even been started.
> >> >
> >> > The state being waited for varies from time to time and is only
> >> > partially related to power.status.  Instead of using a completion I
> >> > suppose we could have used a new "transition_complete" variable
> >> > together with a waitqueue.  Would you prefer that?  It's effectively
> >> > the same thing as a completion, but without the nice packaging already
> >> > provided by the kernel.
> >> No, that doesn't change anything.  What I'd prefer to see is a
> >> wait_for_condition on the desired state of the parent.  As is,
> >> power.completion means one thing during suspend (the device has
> >> started, but not finished, suspending), and a different thing during
> >> resume (the device has not finished resuming, and may not have started
> >> resuming).  That difference is exactly what caused the bug - the
> >> completion has to be set on init so that it is set before the device
> >> starts suspend.
> >
> > Not really.  The bug is there, because my analysis of the suspend error code
> > path was wrong.  Sorry about that, but it has nothing to do with the "different
> > meaning" of the completions during suspend and resume.
> >
> > The completions here are simply used to enforce a specific ordering of
> > operations, nothing more.  They have no meaning beyond that.
>
> The completion variable maintains state.

So what?  Locks also maintain state.

> It has meaning whether or not you want it to.  Leaving it as a completion
> variable requires that you manage that state, which is difficult considering
> there is no documentation and no clear idea in the code of exactly when that
> state is set or clear.

Please run "git show 5af84b82701a96be4b033aaa51d86c72e2ded061" and read the
changelog.  It's described in there quite clearly (I think).

> It would be much cleaner to use a wait queue, and use
> wait_on_condition to wait for the device to be in the desired state.

Well, in fact that was used in one version of the patchset that introduced
asynchronous suspend-resume, but it was rejected by Linus, because it was
based on non-standard synchronization.  The Linus' argument, that I agreed
with, was that standard snychronization constructs, such as locks or
completions, were guaranteed to work accross different architectures and thus
were simply _safer_ to use than open-coded synchronization that you seem to be
preferring.

Completions simply allowed us to get the desired behavior with the least
effort and that's why we used them.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-03  0:14                 ` Colin Cross
@ 2010-09-03  0:35                   ` Rafael J. Wysocki
  2010-09-03  0:35                   ` Rafael J. Wysocki
  1 sibling, 0 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2010-09-03  0:35 UTC (permalink / raw)
  To: Colin Cross
  Cc: Randy Dunlap, Len Brown, Greg Kroah-Hartman, linux-kernel,
	linux-pm, Andrew Morton

On Friday, September 03, 2010, Colin Cross wrote:
> On Thu, Sep 2, 2010 at 4:09 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Friday, September 03, 2010, Colin Cross wrote:
> >> On Thu, Sep 2, 2010 at 2:34 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> >> > On Thu, 2 Sep 2010, Colin Cross wrote:
> >> >
> >> >> That would work, but I still don't see why it's better.  With either
> >> >> of your changes, the power.completion variable is storing state, and
> >> >> not just used for notification.  However, the exact meaning of that
> >> >> state is unclear, especially during the transition from an aborted
> >> >> suspend to resume, and the state is duplicating power.status.  Setting
> >> >> it to complete in dpm_prepare is especially confusing, because at that
> >> >> point nothing is completed, it hasn't even been started.
> >> >
> >> > The state being waited for varies from time to time and is only
> >> > partially related to power.status.  Instead of using a completion I
> >> > suppose we could have used a new "transition_complete" variable
> >> > together with a waitqueue.  Would you prefer that?  It's effectively
> >> > the same thing as a completion, but without the nice packaging already
> >> > provided by the kernel.
> >> No, that doesn't change anything.  What I'd prefer to see is a
> >> wait_for_condition on the desired state of the parent.  As is,
> >> power.completion means one thing during suspend (the device has
> >> started, but not finished, suspending), and a different thing during
> >> resume (the device has not finished resuming, and may not have started
> >> resuming).  That difference is exactly what caused the bug - the
> >> completion has to be set on init so that it is set before the device
> >> starts suspend.
> >
> > Not really.  The bug is there, because my analysis of the suspend error code
> > path was wrong.  Sorry about that, but it has nothing to do with the "different
> > meaning" of the completions during suspend and resume.
> >
> > The completions here are simply used to enforce a specific ordering of
> > operations, nothing more.  They have no meaning beyond that.
>
> The completion variable maintains state.

So what?  Locks also maintain state.

> It has meaning whether or not you want it to.  Leaving it as a completion
> variable requires that you manage that state, which is difficult considering
> there is no documentation and no clear idea in the code of exactly when that
> state is set or clear.

Please run "git show 5af84b82701a96be4b033aaa51d86c72e2ded061" and read the
changelog.  It's described in there quite clearly (I think).

> It would be much cleaner to use a wait queue, and use
> wait_on_condition to wait for the device to be in the desired state.

Well, in fact that was used in one version of the patchset that introduced
asynchronous suspend-resume, but it was rejected by Linus, because it was
based on non-standard synchronization.  The Linus' argument, that I agreed
with, was that standard snychronization constructs, such as locks or
completions, were guaranteed to work accross different architectures and thus
were simply _safer_ to use than open-coded synchronization that you seem to be
preferring.

Completions simply allowed us to get the desired behavior with the least
effort and that's why we used them.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-03  0:35                   ` Rafael J. Wysocki
  2010-09-03  1:54                     ` Colin Cross
@ 2010-09-03  1:54                     ` Colin Cross
  2010-09-03  2:42                       ` Alan Stern
  2010-09-03  2:42                       ` Alan Stern
  1 sibling, 2 replies; 55+ messages in thread
From: Colin Cross @ 2010-09-03  1:54 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, linux-kernel, linux-pm, Pavel Machek, Len Brown,
	Greg Kroah-Hartman, Randy Dunlap, Andrew Morton

On Thu, Sep 2, 2010 at 5:35 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Friday, September 03, 2010, Colin Cross wrote:
>> On Thu, Sep 2, 2010 at 4:09 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>> > On Friday, September 03, 2010, Colin Cross wrote:
>> >> On Thu, Sep 2, 2010 at 2:34 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
>> >> > On Thu, 2 Sep 2010, Colin Cross wrote:
>> >> >
>> >> >> That would work, but I still don't see why it's better.  With either
>> >> >> of your changes, the power.completion variable is storing state, and
>> >> >> not just used for notification.  However, the exact meaning of that
>> >> >> state is unclear, especially during the transition from an aborted
>> >> >> suspend to resume, and the state is duplicating power.status.  Setting
>> >> >> it to complete in dpm_prepare is especially confusing, because at that
>> >> >> point nothing is completed, it hasn't even been started.
>> >> >
>> >> > The state being waited for varies from time to time and is only
>> >> > partially related to power.status.  Instead of using a completion I
>> >> > suppose we could have used a new "transition_complete" variable
>> >> > together with a waitqueue.  Would you prefer that?  It's effectively
>> >> > the same thing as a completion, but without the nice packaging already
>> >> > provided by the kernel.
>> >> No, that doesn't change anything.  What I'd prefer to see is a
>> >> wait_for_condition on the desired state of the parent.  As is,
>> >> power.completion means one thing during suspend (the device has
>> >> started, but not finished, suspending), and a different thing during
>> >> resume (the device has not finished resuming, and may not have started
>> >> resuming).  That difference is exactly what caused the bug - the
>> >> completion has to be set on init so that it is set before the device
>> >> starts suspend.
>> >
>> > Not really.  The bug is there, because my analysis of the suspend error code
>> > path was wrong.  Sorry about that, but it has nothing to do with the "different
>> > meaning" of the completions during suspend and resume.
>> >
>> > The completions here are simply used to enforce a specific ordering of
>> > operations, nothing more.  They have no meaning beyond that.
>>
>> The completion variable maintains state.
>
> So what?  Locks also maintain state.
>
>> It has meaning whether or not you want it to.  Leaving it as a completion
>> variable requires that you manage that state, which is difficult considering
>> there is no documentation and no clear idea in the code of exactly when that
>> state is set or clear.
>
> Please run "git show 5af84b82701a96be4b033aaa51d86c72e2ded061" and read the
> changelog.  It's described in there quite clearly (I think).
Yes, that is very clear, sorry I didn't see it before.  A simple
description closer to the code would have helped me.

>> It would be much cleaner to use a wait queue, and use
>> wait_on_condition to wait for the device to be in the desired state.
>
> Well, in fact that was used in one version of the patchset that introduced
> asynchronous suspend-resume, but it was rejected by Linus, because it was
> based on non-standard synchronization.  The Linus' argument, that I agreed
> with, was that standard snychronization constructs, such as locks or
> completions, were guaranteed to work accross different architectures and thus
> were simply _safer_ to use than open-coded synchronization that you seem to be
> preferring.
If I'm reading the right thread, that was using rwlocks, not
conditions.  wait_on_condition looks just as cross-architecture as
completions, and is almost as simple.

I look at it like this:  Are you waiting for something to complete, or
are you waiting for something to be in a specific state?  Completion
works great if you know that you will only want to wait after it
starts.  That's not true for an aborted suspend - you may call
dpm_wait on a device that has never started resuming, because it never
suspended.  You can use a completion, and make sure it's state is
right for all the corner cases, but at the end of the day that's not
what you mean.  What you mean is "wait on the device to be resumed",
and that's a condition, not a simple completion event.

> Completions simply allowed us to get the desired behavior with the least
> effort and that's why we used them.
I'm happy with the end result, but I may submit a patch that converts
the completions to conditions for discussion.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-03  0:35                   ` Rafael J. Wysocki
@ 2010-09-03  1:54                     ` Colin Cross
  2010-09-03  1:54                     ` Colin Cross
  1 sibling, 0 replies; 55+ messages in thread
From: Colin Cross @ 2010-09-03  1:54 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Randy Dunlap, Len Brown, Greg Kroah-Hartman, linux-kernel,
	linux-pm, Andrew Morton

On Thu, Sep 2, 2010 at 5:35 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Friday, September 03, 2010, Colin Cross wrote:
>> On Thu, Sep 2, 2010 at 4:09 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>> > On Friday, September 03, 2010, Colin Cross wrote:
>> >> On Thu, Sep 2, 2010 at 2:34 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
>> >> > On Thu, 2 Sep 2010, Colin Cross wrote:
>> >> >
>> >> >> That would work, but I still don't see why it's better.  With either
>> >> >> of your changes, the power.completion variable is storing state, and
>> >> >> not just used for notification.  However, the exact meaning of that
>> >> >> state is unclear, especially during the transition from an aborted
>> >> >> suspend to resume, and the state is duplicating power.status.  Setting
>> >> >> it to complete in dpm_prepare is especially confusing, because at that
>> >> >> point nothing is completed, it hasn't even been started.
>> >> >
>> >> > The state being waited for varies from time to time and is only
>> >> > partially related to power.status.  Instead of using a completion I
>> >> > suppose we could have used a new "transition_complete" variable
>> >> > together with a waitqueue.  Would you prefer that?  It's effectively
>> >> > the same thing as a completion, but without the nice packaging already
>> >> > provided by the kernel.
>> >> No, that doesn't change anything.  What I'd prefer to see is a
>> >> wait_for_condition on the desired state of the parent.  As is,
>> >> power.completion means one thing during suspend (the device has
>> >> started, but not finished, suspending), and a different thing during
>> >> resume (the device has not finished resuming, and may not have started
>> >> resuming).  That difference is exactly what caused the bug - the
>> >> completion has to be set on init so that it is set before the device
>> >> starts suspend.
>> >
>> > Not really.  The bug is there, because my analysis of the suspend error code
>> > path was wrong.  Sorry about that, but it has nothing to do with the "different
>> > meaning" of the completions during suspend and resume.
>> >
>> > The completions here are simply used to enforce a specific ordering of
>> > operations, nothing more.  They have no meaning beyond that.
>>
>> The completion variable maintains state.
>
> So what?  Locks also maintain state.
>
>> It has meaning whether or not you want it to.  Leaving it as a completion
>> variable requires that you manage that state, which is difficult considering
>> there is no documentation and no clear idea in the code of exactly when that
>> state is set or clear.
>
> Please run "git show 5af84b82701a96be4b033aaa51d86c72e2ded061" and read the
> changelog.  It's described in there quite clearly (I think).
Yes, that is very clear, sorry I didn't see it before.  A simple
description closer to the code would have helped me.

>> It would be much cleaner to use a wait queue, and use
>> wait_on_condition to wait for the device to be in the desired state.
>
> Well, in fact that was used in one version of the patchset that introduced
> asynchronous suspend-resume, but it was rejected by Linus, because it was
> based on non-standard synchronization.  The Linus' argument, that I agreed
> with, was that standard snychronization constructs, such as locks or
> completions, were guaranteed to work accross different architectures and thus
> were simply _safer_ to use than open-coded synchronization that you seem to be
> preferring.
If I'm reading the right thread, that was using rwlocks, not
conditions.  wait_on_condition looks just as cross-architecture as
completions, and is almost as simple.

I look at it like this:  Are you waiting for something to complete, or
are you waiting for something to be in a specific state?  Completion
works great if you know that you will only want to wait after it
starts.  That's not true for an aborted suspend - you may call
dpm_wait on a device that has never started resuming, because it never
suspended.  You can use a completion, and make sure it's state is
right for all the corner cases, but at the end of the day that's not
what you mean.  What you mean is "wait on the device to be resumed",
and that's a condition, not a simple completion event.

> Completions simply allowed us to get the desired behavior with the least
> effort and that's why we used them.
I'm happy with the end result, but I may submit a patch that converts
the completions to conditions for discussion.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-03  1:54                     ` Colin Cross
@ 2010-09-03  2:42                       ` Alan Stern
  2010-09-03  4:30                         ` Colin Cross
  2010-09-03  4:30                         ` Colin Cross
  2010-09-03  2:42                       ` Alan Stern
  1 sibling, 2 replies; 55+ messages in thread
From: Alan Stern @ 2010-09-03  2:42 UTC (permalink / raw)
  To: Colin Cross
  Cc: Rafael J. Wysocki, linux-kernel, linux-pm, Pavel Machek,
	Len Brown, Greg Kroah-Hartman, Randy Dunlap, Andrew Morton

On Thu, 2 Sep 2010, Colin Cross wrote:

> > Well, in fact that was used in one version of the patchset that introduced
> > asynchronous suspend-resume, but it was rejected by Linus, because it was
> > based on non-standard synchronization.  The Linus' argument, that I agreed
> > with, was that standard snychronization constructs, such as locks or
> > completions, were guaranteed to work accross different architectures and thus
> > were simply _safer_ to use than open-coded synchronization that you seem to be
> > preferring.
> If I'm reading the right thread, that was using rwlocks, not
> conditions.

No, the thread talked about rwsems, not rwlocks.  And that's not what 
Linus didn't like -- indeed, using rwsems was his idea.  He didn't like 
non-standard open-coded synchronization.

>  wait_on_condition looks just as cross-architecture as
> completions, and is almost as simple.

Do you mean wait_event?  It doesn't include the synchronization that 
completions have.

> I look at it like this:  Are you waiting for something to complete, or
> are you waiting for something to be in a specific state?  Completion
> works great if you know that you will only want to wait after it
> starts.  That's not true for an aborted suspend - you may call
> dpm_wait on a device that has never started resuming, because it never
> suspended.  You can use a completion, and make sure it's state is
> right for all the corner cases, but at the end of the day that's not
> what you mean.  What you mean is "wait on the device to be resumed",
> and that's a condition, not a simple completion event.
> 
> > Completions simply allowed us to get the desired behavior with the least
> > effort and that's why we used them.
> I'm happy with the end result, but I may submit a patch that converts
> the completions to conditions for discussion.

Be sure to add memory barriers at the appropriate places.  That's what 
Linus was complaining about in the earlier approach.

Alan Stern


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-03  1:54                     ` Colin Cross
  2010-09-03  2:42                       ` Alan Stern
@ 2010-09-03  2:42                       ` Alan Stern
  1 sibling, 0 replies; 55+ messages in thread
From: Alan Stern @ 2010-09-03  2:42 UTC (permalink / raw)
  To: Colin Cross
  Cc: Randy Dunlap, Len Brown, Greg Kroah-Hartman, linux-kernel,
	linux-pm, Andrew Morton

On Thu, 2 Sep 2010, Colin Cross wrote:

> > Well, in fact that was used in one version of the patchset that introduced
> > asynchronous suspend-resume, but it was rejected by Linus, because it was
> > based on non-standard synchronization.  The Linus' argument, that I agreed
> > with, was that standard snychronization constructs, such as locks or
> > completions, were guaranteed to work accross different architectures and thus
> > were simply _safer_ to use than open-coded synchronization that you seem to be
> > preferring.
> If I'm reading the right thread, that was using rwlocks, not
> conditions.

No, the thread talked about rwsems, not rwlocks.  And that's not what 
Linus didn't like -- indeed, using rwsems was his idea.  He didn't like 
non-standard open-coded synchronization.

>  wait_on_condition looks just as cross-architecture as
> completions, and is almost as simple.

Do you mean wait_event?  It doesn't include the synchronization that 
completions have.

> I look at it like this:  Are you waiting for something to complete, or
> are you waiting for something to be in a specific state?  Completion
> works great if you know that you will only want to wait after it
> starts.  That's not true for an aborted suspend - you may call
> dpm_wait on a device that has never started resuming, because it never
> suspended.  You can use a completion, and make sure it's state is
> right for all the corner cases, but at the end of the day that's not
> what you mean.  What you mean is "wait on the device to be resumed",
> and that's a condition, not a simple completion event.
> 
> > Completions simply allowed us to get the desired behavior with the least
> > effort and that's why we used them.
> I'm happy with the end result, but I may submit a patch that converts
> the completions to conditions for discussion.

Be sure to add memory barriers at the appropriate places.  That's what 
Linus was complaining about in the earlier approach.

Alan Stern

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-03  2:42                       ` Alan Stern
@ 2010-09-03  4:30                         ` Colin Cross
  2010-09-03 14:04                           ` Alan Stern
  2010-09-03 14:04                           ` Alan Stern
  2010-09-03  4:30                         ` Colin Cross
  1 sibling, 2 replies; 55+ messages in thread
From: Colin Cross @ 2010-09-03  4:30 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, linux-kernel, linux-pm, Pavel Machek,
	Len Brown, Greg Kroah-Hartman, Randy Dunlap, Andrew Morton

On Thu, Sep 2, 2010 at 7:42 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Thu, 2 Sep 2010, Colin Cross wrote:
>
>> > Well, in fact that was used in one version of the patchset that introduced
>> > asynchronous suspend-resume, but it was rejected by Linus, because it was
>> > based on non-standard synchronization.  The Linus' argument, that I agreed
>> > with, was that standard snychronization constructs, such as locks or
>> > completions, were guaranteed to work accross different architectures and thus
>> > were simply _safer_ to use than open-coded synchronization that you seem to be
>> > preferring.
>> If I'm reading the right thread, that was using rwlocks, not
>> conditions.
>
> No, the thread talked about rwsems, not rwlocks.  And that's not what
> Linus didn't like -- indeed, using rwsems was his idea.  He didn't like
> non-standard open-coded synchronization.
>
>>  wait_on_condition looks just as cross-architecture as
>> completions, and is almost as simple.
>
> Do you mean wait_event?  It doesn't include the synchronization that
> completions have.
>
>> I look at it like this:  Are you waiting for something to complete, or
>> are you waiting for something to be in a specific state?  Completion
>> works great if you know that you will only want to wait after it
>> starts.  That's not true for an aborted suspend - you may call
>> dpm_wait on a device that has never started resuming, because it never
>> suspended.  You can use a completion, and make sure it's state is
>> right for all the corner cases, but at the end of the day that's not
>> what you mean.  What you mean is "wait on the device to be resumed",
>> and that's a condition, not a simple completion event.
>>
>> > Completions simply allowed us to get the desired behavior with the least
>> > effort and that's why we used them.
>> I'm happy with the end result, but I may submit a patch that converts
>> the completions to conditions for discussion.
>
> Be sure to add memory barriers at the appropriate places.  That's what
> Linus was complaining about in the earlier approach.

You're right, wait_event would be much worse.

I think there's another race condition during suspend.  If an
asynchronous device calls device_pm_wait_for_dev on a device that
hasn't had device_suspend called on it yet, power.completion will
still be set from initialization or the last time it completed resume,
and it won't wait.

Assuming that problem is fixed somehow, there's also a deadlock
possibility.  Consider 3 devices.  A, B, and C, registered in that
order.  A is async, and the suspend handler calls
device_pm_wait_for_dev(C).  B's suspend handler returns an error.  A's
suspend handler is now stuck waiting on C->power.completion, but
device_suspend(C) will never be called.

There are also an unhandled edge condition - what is the expected
behavior for a call to device_pm_wait_for_dev on a device if the
suspend handler for that device returns an error?  Currently, the
calling device will continue as if the target device had suspended.

What about splitting power.completion into two flags,
power.suspend_complete and power.resume_complete?
power.resume_complete is initialized to 1, because the devices start
resumed.  Clear power.suspend_complete for all devices at the
beginning of dpm_suspend, and clear power.resume_complete for any
device that is suspended at the beginning of dpm_resume.  The
semantics of each flag is then always clear.  Any time between the
beginning and end of dpm_suspend, waiting on any device's
power.suspend_complete will block until that device is in suspend.
Any time between the beginning and end of dpm_resume, waiting on
power.resume_complete will block IFF the device is suspended.

A solution to the 2nd and 3rd problems would still be needed - a way
to abort drivers that call device_pm_wait_for_dev when suspend is
aborted, and a return value to tell them the device being waited on is
not suspended.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-03  2:42                       ` Alan Stern
  2010-09-03  4:30                         ` Colin Cross
@ 2010-09-03  4:30                         ` Colin Cross
  1 sibling, 0 replies; 55+ messages in thread
From: Colin Cross @ 2010-09-03  4:30 UTC (permalink / raw)
  To: Alan Stern
  Cc: Randy Dunlap, Len Brown, Greg Kroah-Hartman, linux-kernel,
	linux-pm, Andrew Morton

On Thu, Sep 2, 2010 at 7:42 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Thu, 2 Sep 2010, Colin Cross wrote:
>
>> > Well, in fact that was used in one version of the patchset that introduced
>> > asynchronous suspend-resume, but it was rejected by Linus, because it was
>> > based on non-standard synchronization.  The Linus' argument, that I agreed
>> > with, was that standard snychronization constructs, such as locks or
>> > completions, were guaranteed to work accross different architectures and thus
>> > were simply _safer_ to use than open-coded synchronization that you seem to be
>> > preferring.
>> If I'm reading the right thread, that was using rwlocks, not
>> conditions.
>
> No, the thread talked about rwsems, not rwlocks.  And that's not what
> Linus didn't like -- indeed, using rwsems was his idea.  He didn't like
> non-standard open-coded synchronization.
>
>>  wait_on_condition looks just as cross-architecture as
>> completions, and is almost as simple.
>
> Do you mean wait_event?  It doesn't include the synchronization that
> completions have.
>
>> I look at it like this:  Are you waiting for something to complete, or
>> are you waiting for something to be in a specific state?  Completion
>> works great if you know that you will only want to wait after it
>> starts.  That's not true for an aborted suspend - you may call
>> dpm_wait on a device that has never started resuming, because it never
>> suspended.  You can use a completion, and make sure it's state is
>> right for all the corner cases, but at the end of the day that's not
>> what you mean.  What you mean is "wait on the device to be resumed",
>> and that's a condition, not a simple completion event.
>>
>> > Completions simply allowed us to get the desired behavior with the least
>> > effort and that's why we used them.
>> I'm happy with the end result, but I may submit a patch that converts
>> the completions to conditions for discussion.
>
> Be sure to add memory barriers at the appropriate places.  That's what
> Linus was complaining about in the earlier approach.

You're right, wait_event would be much worse.

I think there's another race condition during suspend.  If an
asynchronous device calls device_pm_wait_for_dev on a device that
hasn't had device_suspend called on it yet, power.completion will
still be set from initialization or the last time it completed resume,
and it won't wait.

Assuming that problem is fixed somehow, there's also a deadlock
possibility.  Consider 3 devices.  A, B, and C, registered in that
order.  A is async, and the suspend handler calls
device_pm_wait_for_dev(C).  B's suspend handler returns an error.  A's
suspend handler is now stuck waiting on C->power.completion, but
device_suspend(C) will never be called.

There are also an unhandled edge condition - what is the expected
behavior for a call to device_pm_wait_for_dev on a device if the
suspend handler for that device returns an error?  Currently, the
calling device will continue as if the target device had suspended.

What about splitting power.completion into two flags,
power.suspend_complete and power.resume_complete?
power.resume_complete is initialized to 1, because the devices start
resumed.  Clear power.suspend_complete for all devices at the
beginning of dpm_suspend, and clear power.resume_complete for any
device that is suspended at the beginning of dpm_resume.  The
semantics of each flag is then always clear.  Any time between the
beginning and end of dpm_suspend, waiting on any device's
power.suspend_complete will block until that device is in suspend.
Any time between the beginning and end of dpm_resume, waiting on
power.resume_complete will block IFF the device is suspended.

A solution to the 2nd and 3rd problems would still be needed - a way
to abort drivers that call device_pm_wait_for_dev when suspend is
aborted, and a return value to tell them the device being waited on is
not suspended.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-03  4:30                         ` Colin Cross
@ 2010-09-03 14:04                           ` Alan Stern
  2010-09-03 16:48                             ` Colin Cross
  2010-09-03 16:48                             ` [PATCH] PM: Prevent waiting forever on asynchronous resume after abort Colin Cross
  2010-09-03 14:04                           ` Alan Stern
  1 sibling, 2 replies; 55+ messages in thread
From: Alan Stern @ 2010-09-03 14:04 UTC (permalink / raw)
  To: Colin Cross
  Cc: Rafael J. Wysocki, linux-kernel, linux-pm, Pavel Machek,
	Len Brown, Greg Kroah-Hartman, Randy Dunlap, Andrew Morton

On Thu, 2 Sep 2010, Colin Cross wrote:

> You're right, wait_event would be much worse.
> 
> I think there's another race condition during suspend.  If an
> asynchronous device calls device_pm_wait_for_dev on a device that
> hasn't had device_suspend called on it yet, power.completion will
> still be set from initialization or the last time it completed resume,
> and it won't wait.

That can't happen in a properly-designed system.  It would mean the 
async device didn't suspend because it was waiting for a device which 
was registered before it -- and that would deadlock even if you used 
synchronous suspend.

> Assuming that problem is fixed somehow, there's also a deadlock
> possibility.  Consider 3 devices.  A, B, and C, registered in that
> order.  A is async, and the suspend handler calls
> device_pm_wait_for_dev(C).  B's suspend handler returns an error.  A's
> suspend handler is now stuck waiting on C->power.completion, but
> device_suspend(C) will never be called.

Why not?  The normal suspend order is last-to-first, so C will be 
suspended before B.

> There are also an unhandled edge condition - what is the expected
> behavior for a call to device_pm_wait_for_dev on a device if the
> suspend handler for that device returns an error?  Currently, the
> calling device will continue as if the target device had suspended.

It looks like __device_suspend needs to set async_error.  Which means 
async_suspend doesn't need to set it.  This is indeed a bug.

> What about splitting power.completion into two flags,
> power.suspend_complete and power.resume_complete?
> power.resume_complete is initialized to 1, because the devices start
> resumed.  Clear power.suspend_complete for all devices at the
> beginning of dpm_suspend, and clear power.resume_complete for any
> device that is suspended at the beginning of dpm_resume.  The
> semantics of each flag is then always clear.  Any time between the
> beginning and end of dpm_suspend, waiting on any device's
> power.suspend_complete will block until that device is in suspend.
> Any time between the beginning and end of dpm_resume, waiting on
> power.resume_complete will block IFF the device is suspended.

How are you going to wait for these things?  With wait_event?  Didn't 
you say above that it would be worse than using completions?

> A solution to the 2nd and 3rd problems would still be needed - a way
> to abort drivers that call device_pm_wait_for_dev when suspend is
> aborted, and a return value to tell them the device being waited on is
> not suspended.

No solutions are needed.  See above.

Alan Stern


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-03  4:30                         ` Colin Cross
  2010-09-03 14:04                           ` Alan Stern
@ 2010-09-03 14:04                           ` Alan Stern
  1 sibling, 0 replies; 55+ messages in thread
From: Alan Stern @ 2010-09-03 14:04 UTC (permalink / raw)
  To: Colin Cross
  Cc: Randy Dunlap, Len Brown, Greg Kroah-Hartman, linux-kernel,
	linux-pm, Andrew Morton

On Thu, 2 Sep 2010, Colin Cross wrote:

> You're right, wait_event would be much worse.
> 
> I think there's another race condition during suspend.  If an
> asynchronous device calls device_pm_wait_for_dev on a device that
> hasn't had device_suspend called on it yet, power.completion will
> still be set from initialization or the last time it completed resume,
> and it won't wait.

That can't happen in a properly-designed system.  It would mean the 
async device didn't suspend because it was waiting for a device which 
was registered before it -- and that would deadlock even if you used 
synchronous suspend.

> Assuming that problem is fixed somehow, there's also a deadlock
> possibility.  Consider 3 devices.  A, B, and C, registered in that
> order.  A is async, and the suspend handler calls
> device_pm_wait_for_dev(C).  B's suspend handler returns an error.  A's
> suspend handler is now stuck waiting on C->power.completion, but
> device_suspend(C) will never be called.

Why not?  The normal suspend order is last-to-first, so C will be 
suspended before B.

> There are also an unhandled edge condition - what is the expected
> behavior for a call to device_pm_wait_for_dev on a device if the
> suspend handler for that device returns an error?  Currently, the
> calling device will continue as if the target device had suspended.

It looks like __device_suspend needs to set async_error.  Which means 
async_suspend doesn't need to set it.  This is indeed a bug.

> What about splitting power.completion into two flags,
> power.suspend_complete and power.resume_complete?
> power.resume_complete is initialized to 1, because the devices start
> resumed.  Clear power.suspend_complete for all devices at the
> beginning of dpm_suspend, and clear power.resume_complete for any
> device that is suspended at the beginning of dpm_resume.  The
> semantics of each flag is then always clear.  Any time between the
> beginning and end of dpm_suspend, waiting on any device's
> power.suspend_complete will block until that device is in suspend.
> Any time between the beginning and end of dpm_resume, waiting on
> power.resume_complete will block IFF the device is suspended.

How are you going to wait for these things?  With wait_event?  Didn't 
you say above that it would be worse than using completions?

> A solution to the 2nd and 3rd problems would still be needed - a way
> to abort drivers that call device_pm_wait_for_dev when suspend is
> aborted, and a return value to tell them the device being waited on is
> not suspended.

No solutions are needed.  See above.

Alan Stern

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-03 14:04                           ` Alan Stern
@ 2010-09-03 16:48                             ` Colin Cross
  2010-09-03 17:31                               ` Alan Stern
  2010-09-03 17:31                               ` Alan Stern
  2010-09-03 16:48                             ` [PATCH] PM: Prevent waiting forever on asynchronous resume after abort Colin Cross
  1 sibling, 2 replies; 55+ messages in thread
From: Colin Cross @ 2010-09-03 16:48 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, linux-kernel, linux-pm, Pavel Machek,
	Len Brown, Greg Kroah-Hartman, Randy Dunlap, Andrew Morton

On Fri, Sep 3, 2010 at 7:04 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Thu, 2 Sep 2010, Colin Cross wrote:
>
>> You're right, wait_event would be much worse.
>>
>> I think there's another race condition during suspend.  If an
>> asynchronous device calls device_pm_wait_for_dev on a device that
>> hasn't had device_suspend called on it yet, power.completion will
>> still be set from initialization or the last time it completed resume,
>> and it won't wait.
>
> That can't happen in a properly-designed system.  It would mean the
> async device didn't suspend because it was waiting for a device which
> was registered before it -- and that would deadlock even if you used
> synchronous suspend.
I see - from the earlier thread, if devices need to break the tree
model for suspend, they still have to follow the list ordering.

>> Assuming that problem is fixed somehow, there's also a deadlock
>> possibility.  Consider 3 devices.  A, B, and C, registered in that
>> order.  A is async, and the suspend handler calls
>> device_pm_wait_for_dev(C).  B's suspend handler returns an error.  A's
>> suspend handler is now stuck waiting on C->power.completion, but
>> device_suspend(C) will never be called.
>
> Why not?  The normal suspend order is last-to-first, so C will be
> suspended before B.
Reverse A and C, but then the earlier comment applies.

>> There are also an unhandled edge condition - what is the expected
>> behavior for a call to device_pm_wait_for_dev on a device if the
>> suspend handler for that device returns an error?  Currently, the
>> calling device will continue as if the target device had suspended.
>
> It looks like __device_suspend needs to set async_error.  Which means
> async_suspend doesn't need to set it.  This is indeed a bug.
Is this sufficient?  The waiting device will complete its suspend
handler, and then be resumed, but the waited-on device never
suspended.  Are drivers expected to handle that case?

>> What about splitting power.completion into two flags,
>> power.suspend_complete and power.resume_complete?
>> power.resume_complete is initialized to 1, because the devices start
>> resumed.  Clear power.suspend_complete for all devices at the
>> beginning of dpm_suspend, and clear power.resume_complete for any
>> device that is suspended at the beginning of dpm_resume.  The
>> semantics of each flag is then always clear.  Any time between the
>> beginning and end of dpm_suspend, waiting on any device's
>> power.suspend_complete will block until that device is in suspend.
>> Any time between the beginning and end of dpm_resume, waiting on
>> power.resume_complete will block IFF the device is suspended.
>
> How are you going to wait for these things?  With wait_event?  Didn't
> you say above that it would be worse than using completions?
I'd have to complete them on errors to abort any waiters, which would
complicate the meaning of the completion flags, but keep the rules
above about the same.

>> A solution to the 2nd and 3rd problems would still be needed - a way
>> to abort drivers that call device_pm_wait_for_dev when suspend is
>> aborted, and a return value to tell them the device being waited on is
>> not suspended.
>
> No solutions are needed.  See above.
>
> Alan Stern
>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-03 14:04                           ` Alan Stern
  2010-09-03 16:48                             ` Colin Cross
@ 2010-09-03 16:48                             ` Colin Cross
  1 sibling, 0 replies; 55+ messages in thread
From: Colin Cross @ 2010-09-03 16:48 UTC (permalink / raw)
  To: Alan Stern
  Cc: Randy Dunlap, Len Brown, Greg Kroah-Hartman, linux-kernel,
	linux-pm, Andrew Morton

On Fri, Sep 3, 2010 at 7:04 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Thu, 2 Sep 2010, Colin Cross wrote:
>
>> You're right, wait_event would be much worse.
>>
>> I think there's another race condition during suspend.  If an
>> asynchronous device calls device_pm_wait_for_dev on a device that
>> hasn't had device_suspend called on it yet, power.completion will
>> still be set from initialization or the last time it completed resume,
>> and it won't wait.
>
> That can't happen in a properly-designed system.  It would mean the
> async device didn't suspend because it was waiting for a device which
> was registered before it -- and that would deadlock even if you used
> synchronous suspend.
I see - from the earlier thread, if devices need to break the tree
model for suspend, they still have to follow the list ordering.

>> Assuming that problem is fixed somehow, there's also a deadlock
>> possibility.  Consider 3 devices.  A, B, and C, registered in that
>> order.  A is async, and the suspend handler calls
>> device_pm_wait_for_dev(C).  B's suspend handler returns an error.  A's
>> suspend handler is now stuck waiting on C->power.completion, but
>> device_suspend(C) will never be called.
>
> Why not?  The normal suspend order is last-to-first, so C will be
> suspended before B.
Reverse A and C, but then the earlier comment applies.

>> There are also an unhandled edge condition - what is the expected
>> behavior for a call to device_pm_wait_for_dev on a device if the
>> suspend handler for that device returns an error?  Currently, the
>> calling device will continue as if the target device had suspended.
>
> It looks like __device_suspend needs to set async_error.  Which means
> async_suspend doesn't need to set it.  This is indeed a bug.
Is this sufficient?  The waiting device will complete its suspend
handler, and then be resumed, but the waited-on device never
suspended.  Are drivers expected to handle that case?

>> What about splitting power.completion into two flags,
>> power.suspend_complete and power.resume_complete?
>> power.resume_complete is initialized to 1, because the devices start
>> resumed.  Clear power.suspend_complete for all devices at the
>> beginning of dpm_suspend, and clear power.resume_complete for any
>> device that is suspended at the beginning of dpm_resume.  The
>> semantics of each flag is then always clear.  Any time between the
>> beginning and end of dpm_suspend, waiting on any device's
>> power.suspend_complete will block until that device is in suspend.
>> Any time between the beginning and end of dpm_resume, waiting on
>> power.resume_complete will block IFF the device is suspended.
>
> How are you going to wait for these things?  With wait_event?  Didn't
> you say above that it would be worse than using completions?
I'd have to complete them on errors to abort any waiters, which would
complicate the meaning of the completion flags, but keep the rules
above about the same.

>> A solution to the 2nd and 3rd problems would still be needed - a way
>> to abort drivers that call device_pm_wait_for_dev when suspend is
>> aborted, and a return value to tell them the device being waited on is
>> not suspended.
>
> No solutions are needed.  See above.
>
> Alan Stern
>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-03 16:48                             ` Colin Cross
  2010-09-03 17:31                               ` Alan Stern
@ 2010-09-03 17:31                               ` Alan Stern
  2010-09-16 20:36                                 ` [PATCH] PM: Fix potential issue with failing asynchronous suspend (was: Re: [PATCH] PM: Prevent waiting ...) Rafael J. Wysocki
  2010-09-16 20:36                                 ` Rafael J. Wysocki
  1 sibling, 2 replies; 55+ messages in thread
From: Alan Stern @ 2010-09-03 17:31 UTC (permalink / raw)
  To: Colin Cross
  Cc: Rafael J. Wysocki, linux-kernel, linux-pm, Pavel Machek,
	Len Brown, Greg Kroah-Hartman, Randy Dunlap, Andrew Morton

On Fri, 3 Sep 2010, Colin Cross wrote:

> >> I think there's another race condition during suspend.  If an
> >> asynchronous device calls device_pm_wait_for_dev on a device that
> >> hasn't had device_suspend called on it yet, power.completion will
> >> still be set from initialization or the last time it completed resume,
> >> and it won't wait.
> >
> > That can't happen in a properly-designed system.  It would mean the
> > async device didn't suspend because it was waiting for a device which
> > was registered before it -- and that would deadlock even if you used
> > synchronous suspend.
> I see - from the earlier thread, if devices need to break the tree
> model for suspend, they still have to follow the list ordering.

Right.  After all, the user can force the system into doing a 
synchronous suspend whenever he wants.

> >> Assuming that problem is fixed somehow, there's also a deadlock
> >> possibility.  Consider 3 devices.  A, B, and C, registered in that
> >> order.  A is async, and the suspend handler calls
> >> device_pm_wait_for_dev(C).  B's suspend handler returns an error.  A's
> >> suspend handler is now stuck waiting on C->power.completion, but
> >> device_suspend(C) will never be called.
> >
> > Why not?  The normal suspend order is last-to-first, so C will be
> > suspended before B.
> Reverse A and C, but then the earlier comment applies.

Exactly.

> >> There are also an unhandled edge condition - what is the expected
> >> behavior for a call to device_pm_wait_for_dev on a device if the
> >> suspend handler for that device returns an error?  Currently, the
> >> calling device will continue as if the target device had suspended.
> >
> > It looks like __device_suspend needs to set async_error.  Which means
> > async_suspend doesn't need to set it.  This is indeed a bug.
> Is this sufficient?  The waiting device will complete its suspend
> handler, and then be resumed, but the waited-on device never
> suspended.  Are drivers expected to handle that case?

Sorry, my reply wasn't very good.  There are _two_ related problems:
drivers calling device_pm_wait_for_dev and also the internal call where
__device_suspend calls dpm_wait_for_children.  They're both subject to
this bug, and my comment referred to the second problem rather than the
one you raised.

The entire "if (error) {"  block can be moved from async_suspend to the
end of __device_suspend, with suitable adjustment to the string
constant in the error message.  At the same time,
device_pm_wait_for_dev should return async_error instead of returning
void.  Which means its callers will have to check the return value (I
don't think there are very many callers at the moment).  Together those 
changes should fix everything.

> >> What about splitting power.completion into two flags,
> >> power.suspend_complete and power.resume_complete?
> >> power.resume_complete is initialized to 1, because the devices start
> >> resumed.  Clear power.suspend_complete for all devices at the
> >> beginning of dpm_suspend, and clear power.resume_complete for any
> >> device that is suspended at the beginning of dpm_resume.  The
> >> semantics of each flag is then always clear.  Any time between the
> >> beginning and end of dpm_suspend, waiting on any device's
> >> power.suspend_complete will block until that device is in suspend.
> >> Any time between the beginning and end of dpm_resume, waiting on
> >> power.resume_complete will block IFF the device is suspended.
> >
> > How are you going to wait for these things?  With wait_event?  Didn't
> > you say above that it would be worse than using completions?
> I'd have to complete them on errors to abort any waiters, which would
> complicate the meaning of the completion flags, but keep the rules
> above about the same.

I don't exactly follow what you're proposing here, but on the whole it
seems more complex than the current code.  There doesn't seem to be any
advantage in changing.

Now, if you want to add some comments to the existing code explaining 
what the completions are waiting for and how they need to be 
initialized, I'm sure nobody would object.

Oh yes, and please fix that bug.

Alan Stern


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
  2010-09-03 16:48                             ` Colin Cross
@ 2010-09-03 17:31                               ` Alan Stern
  2010-09-03 17:31                               ` Alan Stern
  1 sibling, 0 replies; 55+ messages in thread
From: Alan Stern @ 2010-09-03 17:31 UTC (permalink / raw)
  To: Colin Cross
  Cc: Randy Dunlap, Len Brown, Greg Kroah-Hartman, linux-kernel,
	linux-pm, Andrew Morton

On Fri, 3 Sep 2010, Colin Cross wrote:

> >> I think there's another race condition during suspend.  If an
> >> asynchronous device calls device_pm_wait_for_dev on a device that
> >> hasn't had device_suspend called on it yet, power.completion will
> >> still be set from initialization or the last time it completed resume,
> >> and it won't wait.
> >
> > That can't happen in a properly-designed system.  It would mean the
> > async device didn't suspend because it was waiting for a device which
> > was registered before it -- and that would deadlock even if you used
> > synchronous suspend.
> I see - from the earlier thread, if devices need to break the tree
> model for suspend, they still have to follow the list ordering.

Right.  After all, the user can force the system into doing a 
synchronous suspend whenever he wants.

> >> Assuming that problem is fixed somehow, there's also a deadlock
> >> possibility.  Consider 3 devices.  A, B, and C, registered in that
> >> order.  A is async, and the suspend handler calls
> >> device_pm_wait_for_dev(C).  B's suspend handler returns an error.  A's
> >> suspend handler is now stuck waiting on C->power.completion, but
> >> device_suspend(C) will never be called.
> >
> > Why not?  The normal suspend order is last-to-first, so C will be
> > suspended before B.
> Reverse A and C, but then the earlier comment applies.

Exactly.

> >> There are also an unhandled edge condition - what is the expected
> >> behavior for a call to device_pm_wait_for_dev on a device if the
> >> suspend handler for that device returns an error?  Currently, the
> >> calling device will continue as if the target device had suspended.
> >
> > It looks like __device_suspend needs to set async_error.  Which means
> > async_suspend doesn't need to set it.  This is indeed a bug.
> Is this sufficient?  The waiting device will complete its suspend
> handler, and then be resumed, but the waited-on device never
> suspended.  Are drivers expected to handle that case?

Sorry, my reply wasn't very good.  There are _two_ related problems:
drivers calling device_pm_wait_for_dev and also the internal call where
__device_suspend calls dpm_wait_for_children.  They're both subject to
this bug, and my comment referred to the second problem rather than the
one you raised.

The entire "if (error) {"  block can be moved from async_suspend to the
end of __device_suspend, with suitable adjustment to the string
constant in the error message.  At the same time,
device_pm_wait_for_dev should return async_error instead of returning
void.  Which means its callers will have to check the return value (I
don't think there are very many callers at the moment).  Together those 
changes should fix everything.

> >> What about splitting power.completion into two flags,
> >> power.suspend_complete and power.resume_complete?
> >> power.resume_complete is initialized to 1, because the devices start
> >> resumed.  Clear power.suspend_complete for all devices at the
> >> beginning of dpm_suspend, and clear power.resume_complete for any
> >> device that is suspended at the beginning of dpm_resume.  The
> >> semantics of each flag is then always clear.  Any time between the
> >> beginning and end of dpm_suspend, waiting on any device's
> >> power.suspend_complete will block until that device is in suspend.
> >> Any time between the beginning and end of dpm_resume, waiting on
> >> power.resume_complete will block IFF the device is suspended.
> >
> > How are you going to wait for these things?  With wait_event?  Didn't
> > you say above that it would be worse than using completions?
> I'd have to complete them on errors to abort any waiters, which would
> complicate the meaning of the completion flags, but keep the rules
> above about the same.

I don't exactly follow what you're proposing here, but on the whole it
seems more complex than the current code.  There doesn't seem to be any
advantage in changing.

Now, if you want to add some comments to the existing code explaining 
what the completions are waiting for and how they need to be 
initialized, I'm sure nobody would object.

Oh yes, and please fix that bug.

Alan Stern

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH] PM: Fix potential issue with failing asynchronous suspend (was: Re: [PATCH] PM: Prevent waiting ...)
  2010-09-03 17:31                               ` Alan Stern
  2010-09-16 20:36                                 ` [PATCH] PM: Fix potential issue with failing asynchronous suspend (was: Re: [PATCH] PM: Prevent waiting ...) Rafael J. Wysocki
@ 2010-09-16 20:36                                 ` Rafael J. Wysocki
  2010-09-16 21:00                                   ` Alan Stern
  2010-09-16 21:00                                   ` Alan Stern
  1 sibling, 2 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2010-09-16 20:36 UTC (permalink / raw)
  To: Alan Stern; +Cc: Colin Cross, linux-kernel, linux-pm, Pavel Machek

On Friday, September 03, 2010, Alan Stern wrote:
> On Fri, 3 Sep 2010, Colin Cross wrote:
> 
> > >> I think there's another race condition during suspend.  If an
> > >> asynchronous device calls device_pm_wait_for_dev on a device that
> > >> hasn't had device_suspend called on it yet, power.completion will
> > >> still be set from initialization or the last time it completed resume,
> > >> and it won't wait.
> > >
> > > That can't happen in a properly-designed system.  It would mean the
> > > async device didn't suspend because it was waiting for a device which
> > > was registered before it -- and that would deadlock even if you used
> > > synchronous suspend.
> > I see - from the earlier thread, if devices need to break the tree
> > model for suspend, they still have to follow the list ordering.
> 
> Right.  After all, the user can force the system into doing a 
> synchronous suspend whenever he wants.
> 
> > >> Assuming that problem is fixed somehow, there's also a deadlock
> > >> possibility.  Consider 3 devices.  A, B, and C, registered in that
> > >> order.  A is async, and the suspend handler calls
> > >> device_pm_wait_for_dev(C).  B's suspend handler returns an error.  A's
> > >> suspend handler is now stuck waiting on C->power.completion, but
> > >> device_suspend(C) will never be called.
> > >
> > > Why not?  The normal suspend order is last-to-first, so C will be
> > > suspended before B.
> > Reverse A and C, but then the earlier comment applies.
> 
> Exactly.
> 
> > >> There are also an unhandled edge condition - what is the expected
> > >> behavior for a call to device_pm_wait_for_dev on a device if the
> > >> suspend handler for that device returns an error?  Currently, the
> > >> calling device will continue as if the target device had suspended.
> > >
> > > It looks like __device_suspend needs to set async_error.  Which means
> > > async_suspend doesn't need to set it.  This is indeed a bug.
> > Is this sufficient?  The waiting device will complete its suspend
> > handler, and then be resumed, but the waited-on device never
> > suspended.  Are drivers expected to handle that case?
> 
> Sorry, my reply wasn't very good.  There are _two_ related problems:
> drivers calling device_pm_wait_for_dev and also the internal call where
> __device_suspend calls dpm_wait_for_children.  They're both subject to
> this bug, and my comment referred to the second problem rather than the
> one you raised.
> 
> The entire "if (error) {"  block can be moved from async_suspend to the
> end of __device_suspend, with suitable adjustment to the string
> constant in the error message.

In fact this message belongs in async_suspend() (the "synchronous" code has
its own version).

> At the same time,
> device_pm_wait_for_dev should return async_error instead of returning
> void.  Which means its callers will have to check the return value (I
> don't think there are very many callers at the moment).  Together those 
> changes should fix everything.

So, I think the patch below should fix the issue.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Fix potential issue with failing asynchronous suspend

There is a potential issue with the asynchronous suspend code that
a device driver suspending asynchronously may not notice that it
should back off.  There are two failing scenarions, (1) when the
driver is waiting for a driver suspending synchronously to complete
and that second driver returns error code, in which case async_error
won't be set and the waiting driver will continue suspending and (2)
after the driver has called device_pm_wait_for_dev() and the waited
for driver returns error code, in which case the caller of
device_pm_wait_for_dev() will not know that there was an error and
will continue suspending.

To fix this issue make __device_suspend() set async_error, so
async_suspend() doesn't need to set it any more, and make
device_pm_wait_for_dev() return async_error, so that its callers
can check whether or not they should continue suspending.

No more changes are necessary, since device_pm_wait_for_dev() is
not used by any drivers' suspend routines at the moment.

Reported-by: Colin Cross <ccross@android.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/base/power/main.c |   15 +++++++++------
 include/linux/pm.h        |    7 +++++--
 2 files changed, 14 insertions(+), 8 deletions(-)

Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -51,6 +51,8 @@ static pm_message_t pm_transition;
  */
 static bool transition_started;
 
+static int async_error;
+
 /**
  * device_pm_init - Initialize the PM-related part of a device object.
  * @dev: Device object being initialized.
@@ -602,6 +604,7 @@ static void dpm_resume(pm_message_t stat
 	INIT_LIST_HEAD(&list);
 	mutex_lock(&dpm_list_mtx);
 	pm_transition = state;
+	async_error = 0;
 
 	list_for_each_entry(dev, &dpm_list, power.entry) {
 		if (dev->power.status < DPM_OFF)
@@ -831,8 +834,6 @@ static int legacy_suspend(struct device 
 	return error;
 }
 
-static int async_error;
-
 /**
  * device_suspend - Execute "suspend" callbacks for given device.
  * @dev: Device to handle.
@@ -887,6 +888,9 @@ static int __device_suspend(struct devic
 	device_unlock(dev);
 	complete_all(&dev->power.completion);
 
+	if (error)
+		async_error = error;
+
 	return error;
 }
 
@@ -896,10 +900,8 @@ static void async_suspend(void *data, as
 	int error;
 
 	error = __device_suspend(dev, pm_transition, true);
-	if (error) {
+	if (error)
 		pm_dev_err(dev, pm_transition, " async", error);
-		async_error = error;
-	}
 
 	put_device(dev);
 }
@@ -1087,8 +1089,9 @@ EXPORT_SYMBOL_GPL(__suspend_report_resul
  * @dev: Device to wait for.
  * @subordinate: Device that needs to wait for @dev.
  */
-void device_pm_wait_for_dev(struct device *subordinate, struct device *dev)
+int device_pm_wait_for_dev(struct device *subordinate, struct device *dev)
 {
 	dpm_wait(dev, subordinate->power.async_suspend);
+	return async_error;
 }
 EXPORT_SYMBOL_GPL(device_pm_wait_for_dev);
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -559,7 +559,7 @@ extern void __suspend_report_result(cons
 		__suspend_report_result(__func__, fn, ret);		\
 	} while (0)
 
-extern void device_pm_wait_for_dev(struct device *sub, struct device *dev);
+extern int device_pm_wait_for_dev(struct device *sub, struct device *dev);
 #else /* !CONFIG_PM_SLEEP */
 
 #define device_pm_lock() do {} while (0)
@@ -572,7 +572,10 @@ static inline int dpm_suspend_start(pm_m
 
 #define suspend_report_result(fn, ret)		do {} while (0)
 
-static inline void device_pm_wait_for_dev(struct device *a, struct device *b) {}
+static inline int device_pm_wait_for_dev(struct device *a, struct device *b)
+{
+	return 0;
+}
 #endif /* !CONFIG_PM_SLEEP */
 
 /* How to reorder dpm_list after device_move() */

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH] PM: Fix potential issue with failing asynchronous suspend (was: Re: [PATCH] PM: Prevent waiting ...)
  2010-09-03 17:31                               ` Alan Stern
@ 2010-09-16 20:36                                 ` Rafael J. Wysocki
  2010-09-16 20:36                                 ` Rafael J. Wysocki
  1 sibling, 0 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2010-09-16 20:36 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-pm, linux-kernel, Colin Cross

On Friday, September 03, 2010, Alan Stern wrote:
> On Fri, 3 Sep 2010, Colin Cross wrote:
> 
> > >> I think there's another race condition during suspend.  If an
> > >> asynchronous device calls device_pm_wait_for_dev on a device that
> > >> hasn't had device_suspend called on it yet, power.completion will
> > >> still be set from initialization or the last time it completed resume,
> > >> and it won't wait.
> > >
> > > That can't happen in a properly-designed system.  It would mean the
> > > async device didn't suspend because it was waiting for a device which
> > > was registered before it -- and that would deadlock even if you used
> > > synchronous suspend.
> > I see - from the earlier thread, if devices need to break the tree
> > model for suspend, they still have to follow the list ordering.
> 
> Right.  After all, the user can force the system into doing a 
> synchronous suspend whenever he wants.
> 
> > >> Assuming that problem is fixed somehow, there's also a deadlock
> > >> possibility.  Consider 3 devices.  A, B, and C, registered in that
> > >> order.  A is async, and the suspend handler calls
> > >> device_pm_wait_for_dev(C).  B's suspend handler returns an error.  A's
> > >> suspend handler is now stuck waiting on C->power.completion, but
> > >> device_suspend(C) will never be called.
> > >
> > > Why not?  The normal suspend order is last-to-first, so C will be
> > > suspended before B.
> > Reverse A and C, but then the earlier comment applies.
> 
> Exactly.
> 
> > >> There are also an unhandled edge condition - what is the expected
> > >> behavior for a call to device_pm_wait_for_dev on a device if the
> > >> suspend handler for that device returns an error?  Currently, the
> > >> calling device will continue as if the target device had suspended.
> > >
> > > It looks like __device_suspend needs to set async_error.  Which means
> > > async_suspend doesn't need to set it.  This is indeed a bug.
> > Is this sufficient?  The waiting device will complete its suspend
> > handler, and then be resumed, but the waited-on device never
> > suspended.  Are drivers expected to handle that case?
> 
> Sorry, my reply wasn't very good.  There are _two_ related problems:
> drivers calling device_pm_wait_for_dev and also the internal call where
> __device_suspend calls dpm_wait_for_children.  They're both subject to
> this bug, and my comment referred to the second problem rather than the
> one you raised.
> 
> The entire "if (error) {"  block can be moved from async_suspend to the
> end of __device_suspend, with suitable adjustment to the string
> constant in the error message.

In fact this message belongs in async_suspend() (the "synchronous" code has
its own version).

> At the same time,
> device_pm_wait_for_dev should return async_error instead of returning
> void.  Which means its callers will have to check the return value (I
> don't think there are very many callers at the moment).  Together those 
> changes should fix everything.

So, I think the patch below should fix the issue.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Fix potential issue with failing asynchronous suspend

There is a potential issue with the asynchronous suspend code that
a device driver suspending asynchronously may not notice that it
should back off.  There are two failing scenarions, (1) when the
driver is waiting for a driver suspending synchronously to complete
and that second driver returns error code, in which case async_error
won't be set and the waiting driver will continue suspending and (2)
after the driver has called device_pm_wait_for_dev() and the waited
for driver returns error code, in which case the caller of
device_pm_wait_for_dev() will not know that there was an error and
will continue suspending.

To fix this issue make __device_suspend() set async_error, so
async_suspend() doesn't need to set it any more, and make
device_pm_wait_for_dev() return async_error, so that its callers
can check whether or not they should continue suspending.

No more changes are necessary, since device_pm_wait_for_dev() is
not used by any drivers' suspend routines at the moment.

Reported-by: Colin Cross <ccross@android.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/base/power/main.c |   15 +++++++++------
 include/linux/pm.h        |    7 +++++--
 2 files changed, 14 insertions(+), 8 deletions(-)

Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -51,6 +51,8 @@ static pm_message_t pm_transition;
  */
 static bool transition_started;
 
+static int async_error;
+
 /**
  * device_pm_init - Initialize the PM-related part of a device object.
  * @dev: Device object being initialized.
@@ -602,6 +604,7 @@ static void dpm_resume(pm_message_t stat
 	INIT_LIST_HEAD(&list);
 	mutex_lock(&dpm_list_mtx);
 	pm_transition = state;
+	async_error = 0;
 
 	list_for_each_entry(dev, &dpm_list, power.entry) {
 		if (dev->power.status < DPM_OFF)
@@ -831,8 +834,6 @@ static int legacy_suspend(struct device 
 	return error;
 }
 
-static int async_error;
-
 /**
  * device_suspend - Execute "suspend" callbacks for given device.
  * @dev: Device to handle.
@@ -887,6 +888,9 @@ static int __device_suspend(struct devic
 	device_unlock(dev);
 	complete_all(&dev->power.completion);
 
+	if (error)
+		async_error = error;
+
 	return error;
 }
 
@@ -896,10 +900,8 @@ static void async_suspend(void *data, as
 	int error;
 
 	error = __device_suspend(dev, pm_transition, true);
-	if (error) {
+	if (error)
 		pm_dev_err(dev, pm_transition, " async", error);
-		async_error = error;
-	}
 
 	put_device(dev);
 }
@@ -1087,8 +1089,9 @@ EXPORT_SYMBOL_GPL(__suspend_report_resul
  * @dev: Device to wait for.
  * @subordinate: Device that needs to wait for @dev.
  */
-void device_pm_wait_for_dev(struct device *subordinate, struct device *dev)
+int device_pm_wait_for_dev(struct device *subordinate, struct device *dev)
 {
 	dpm_wait(dev, subordinate->power.async_suspend);
+	return async_error;
 }
 EXPORT_SYMBOL_GPL(device_pm_wait_for_dev);
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -559,7 +559,7 @@ extern void __suspend_report_result(cons
 		__suspend_report_result(__func__, fn, ret);		\
 	} while (0)
 
-extern void device_pm_wait_for_dev(struct device *sub, struct device *dev);
+extern int device_pm_wait_for_dev(struct device *sub, struct device *dev);
 #else /* !CONFIG_PM_SLEEP */
 
 #define device_pm_lock() do {} while (0)
@@ -572,7 +572,10 @@ static inline int dpm_suspend_start(pm_m
 
 #define suspend_report_result(fn, ret)		do {} while (0)
 
-static inline void device_pm_wait_for_dev(struct device *a, struct device *b) {}
+static inline int device_pm_wait_for_dev(struct device *a, struct device *b)
+{
+	return 0;
+}
 #endif /* !CONFIG_PM_SLEEP */
 
 /* How to reorder dpm_list after device_move() */

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Fix potential issue with failing asynchronous suspend (was: Re: [PATCH] PM: Prevent waiting ...)
  2010-09-16 20:36                                 ` Rafael J. Wysocki
@ 2010-09-16 21:00                                   ` Alan Stern
  2010-09-16 21:24                                     ` Rafael J. Wysocki
  2010-09-16 21:24                                     ` Rafael J. Wysocki
  2010-09-16 21:00                                   ` Alan Stern
  1 sibling, 2 replies; 55+ messages in thread
From: Alan Stern @ 2010-09-16 21:00 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Colin Cross, linux-kernel, linux-pm, Pavel Machek

On Thu, 16 Sep 2010, Rafael J. Wysocki wrote:

> From: Rafael J. Wysocki <rjw@sisk.pl>
> Subject: PM: Fix potential issue with failing asynchronous suspend
> 
> There is a potential issue with the asynchronous suspend code that
> a device driver suspending asynchronously may not notice that it
> should back off.  There are two failing scenarions, (1) when the
> driver is waiting for a driver suspending synchronously to complete
> and that second driver returns error code, in which case async_error
> won't be set and the waiting driver will continue suspending and (2)
> after the driver has called device_pm_wait_for_dev() and the waited
> for driver returns error code, in which case the caller of
> device_pm_wait_for_dev() will not know that there was an error and
> will continue suspending.
> 
> To fix this issue make __device_suspend() set async_error, so
> async_suspend() doesn't need to set it any more, and make
> device_pm_wait_for_dev() return async_error, so that its callers
> can check whether or not they should continue suspending.
> 
> No more changes are necessary, since device_pm_wait_for_dev() is
> not used by any drivers' suspend routines at the moment.

You just squeaked by, since it is _is_ being used by a USB _resume_
routine.  :-)

Alan Stern


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Fix potential issue with failing asynchronous suspend (was: Re: [PATCH] PM: Prevent waiting ...)
  2010-09-16 20:36                                 ` Rafael J. Wysocki
  2010-09-16 21:00                                   ` Alan Stern
@ 2010-09-16 21:00                                   ` Alan Stern
  1 sibling, 0 replies; 55+ messages in thread
From: Alan Stern @ 2010-09-16 21:00 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-pm, linux-kernel, Colin Cross

On Thu, 16 Sep 2010, Rafael J. Wysocki wrote:

> From: Rafael J. Wysocki <rjw@sisk.pl>
> Subject: PM: Fix potential issue with failing asynchronous suspend
> 
> There is a potential issue with the asynchronous suspend code that
> a device driver suspending asynchronously may not notice that it
> should back off.  There are two failing scenarions, (1) when the
> driver is waiting for a driver suspending synchronously to complete
> and that second driver returns error code, in which case async_error
> won't be set and the waiting driver will continue suspending and (2)
> after the driver has called device_pm_wait_for_dev() and the waited
> for driver returns error code, in which case the caller of
> device_pm_wait_for_dev() will not know that there was an error and
> will continue suspending.
> 
> To fix this issue make __device_suspend() set async_error, so
> async_suspend() doesn't need to set it any more, and make
> device_pm_wait_for_dev() return async_error, so that its callers
> can check whether or not they should continue suspending.
> 
> No more changes are necessary, since device_pm_wait_for_dev() is
> not used by any drivers' suspend routines at the moment.

You just squeaked by, since it is _is_ being used by a USB _resume_
routine.  :-)

Alan Stern

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Fix potential issue with failing asynchronous suspend (was: Re: [PATCH] PM: Prevent waiting ...)
  2010-09-16 21:00                                   ` Alan Stern
  2010-09-16 21:24                                     ` Rafael J. Wysocki
@ 2010-09-16 21:24                                     ` Rafael J. Wysocki
  1 sibling, 0 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2010-09-16 21:24 UTC (permalink / raw)
  To: Alan Stern; +Cc: Colin Cross, linux-kernel, linux-pm, Pavel Machek

On Thursday, September 16, 2010, Alan Stern wrote:
> On Thu, 16 Sep 2010, Rafael J. Wysocki wrote:
> 
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > Subject: PM: Fix potential issue with failing asynchronous suspend
> > 
> > There is a potential issue with the asynchronous suspend code that
> > a device driver suspending asynchronously may not notice that it
> > should back off.  There are two failing scenarions, (1) when the
> > driver is waiting for a driver suspending synchronously to complete
> > and that second driver returns error code, in which case async_error
> > won't be set and the waiting driver will continue suspending and (2)
> > after the driver has called device_pm_wait_for_dev() and the waited
> > for driver returns error code, in which case the caller of
> > device_pm_wait_for_dev() will not know that there was an error and
> > will continue suspending.
> > 
> > To fix this issue make __device_suspend() set async_error, so
> > async_suspend() doesn't need to set it any more, and make
> > device_pm_wait_for_dev() return async_error, so that its callers
> > can check whether or not they should continue suspending.
> > 
> > No more changes are necessary, since device_pm_wait_for_dev() is
> > not used by any drivers' suspend routines at the moment.
> 
> You just squeaked by, since it is _is_ being used by a USB _resume_
> routine.  :-)

I know that, but resume actually doesn't need to care about async_error.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] PM: Fix potential issue with failing asynchronous suspend (was: Re: [PATCH] PM: Prevent waiting ...)
  2010-09-16 21:00                                   ` Alan Stern
@ 2010-09-16 21:24                                     ` Rafael J. Wysocki
  2010-09-16 21:24                                     ` Rafael J. Wysocki
  1 sibling, 0 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2010-09-16 21:24 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-pm, linux-kernel, Colin Cross

On Thursday, September 16, 2010, Alan Stern wrote:
> On Thu, 16 Sep 2010, Rafael J. Wysocki wrote:
> 
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > Subject: PM: Fix potential issue with failing asynchronous suspend
> > 
> > There is a potential issue with the asynchronous suspend code that
> > a device driver suspending asynchronously may not notice that it
> > should back off.  There are two failing scenarions, (1) when the
> > driver is waiting for a driver suspending synchronously to complete
> > and that second driver returns error code, in which case async_error
> > won't be set and the waiting driver will continue suspending and (2)
> > after the driver has called device_pm_wait_for_dev() and the waited
> > for driver returns error code, in which case the caller of
> > device_pm_wait_for_dev() will not know that there was an error and
> > will continue suspending.
> > 
> > To fix this issue make __device_suspend() set async_error, so
> > async_suspend() doesn't need to set it any more, and make
> > device_pm_wait_for_dev() return async_error, so that its callers
> > can check whether or not they should continue suspending.
> > 
> > No more changes are necessary, since device_pm_wait_for_dev() is
> > not used by any drivers' suspend routines at the moment.
> 
> You just squeaked by, since it is _is_ being used by a USB _resume_
> routine.  :-)

I know that, but resume actually doesn't need to care about async_error.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2010-09-16 21:25 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-02  2:54 [PATCH] PM: Prevent waiting forever on asynchronous resume after abort Colin Cross
2010-09-02 13:50 ` Alan Stern
2010-09-02 13:50 ` Alan Stern
2010-09-02 19:46   ` Rafael J. Wysocki
2010-09-02 19:46   ` Rafael J. Wysocki
2010-09-02 20:24     ` Colin Cross
2010-09-02 20:30       ` Rafael J. Wysocki
2010-09-02 20:30       ` Rafael J. Wysocki
2010-09-02 20:45       ` Alan Stern
2010-09-02 20:45       ` Alan Stern
2010-09-02 21:01         ` Colin Cross
2010-09-02 21:01         ` Colin Cross
2010-09-02 21:06           ` Rafael J. Wysocki
2010-09-02 21:06             ` Rafael J. Wysocki
2010-09-02 21:34           ` Alan Stern
2010-09-02 22:45             ` Colin Cross
2010-09-02 23:09               ` Rafael J. Wysocki
2010-09-03  0:14                 ` Colin Cross
2010-09-03  0:14                 ` Colin Cross
2010-09-03  0:35                   ` Rafael J. Wysocki
2010-09-03  0:35                   ` Rafael J. Wysocki
2010-09-03  1:54                     ` Colin Cross
2010-09-03  1:54                     ` Colin Cross
2010-09-03  2:42                       ` Alan Stern
2010-09-03  4:30                         ` Colin Cross
2010-09-03 14:04                           ` Alan Stern
2010-09-03 16:48                             ` Colin Cross
2010-09-03 17:31                               ` Alan Stern
2010-09-03 17:31                               ` Alan Stern
2010-09-16 20:36                                 ` [PATCH] PM: Fix potential issue with failing asynchronous suspend (was: Re: [PATCH] PM: Prevent waiting ...) Rafael J. Wysocki
2010-09-16 20:36                                 ` Rafael J. Wysocki
2010-09-16 21:00                                   ` Alan Stern
2010-09-16 21:24                                     ` Rafael J. Wysocki
2010-09-16 21:24                                     ` Rafael J. Wysocki
2010-09-16 21:00                                   ` Alan Stern
2010-09-03 16:48                             ` [PATCH] PM: Prevent waiting forever on asynchronous resume after abort Colin Cross
2010-09-03 14:04                           ` Alan Stern
2010-09-03  4:30                         ` Colin Cross
2010-09-03  2:42                       ` Alan Stern
2010-09-02 23:09               ` Rafael J. Wysocki
2010-09-02 22:45             ` Colin Cross
2010-09-02 21:34           ` Alan Stern
2010-09-02 21:05       ` Rafael J. Wysocki
2010-09-02 21:05         ` Rafael J. Wysocki
2010-09-02 21:31         ` Colin Cross
2010-09-02 21:40           ` Rafael J. Wysocki
2010-09-02 22:59             ` [PATCH v2] " Colin Cross
2010-09-02 23:25               ` Rafael J. Wysocki
2010-09-02 23:25                 ` Rafael J. Wysocki
2010-09-02 22:59             ` Colin Cross
2010-09-02 21:40           ` [PATCH] " Rafael J. Wysocki
2010-09-02 21:31         ` Colin Cross
2010-09-02 20:24     ` Colin Cross
2010-09-02 20:27     ` Alan Stern
2010-09-02 20:27     ` Alan Stern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.