[PATCH] PCI / PM: Block races between runtime PM and system sleep

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] PCI / PM: Block races between runtime PM and system sleep
@ 2011-06-19 19:49 Rafael J. Wysocki
  2011-06-20 14:46 ` Alan Stern
  2011-06-20 14:46 ` Alan Stern
  0 siblings, 2 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-06-19 19:49 UTC (permalink / raw)
  To: Linux PM mailing list; +Cc: LKML, Alan Stern, Jesse Barnes, linux-pci

From: Rafael J. Wysocki <rjw@sisk.pl>

After commit e8665002477f0278f84f898145b1f141ba26ee26
(PM: Allow pm_runtime_suspend() to succeed during system suspend) it
is possible that a device resumed by the pm_runtime_resume(dev) in
pci_pm_prepare() will be suspended immediately from a work item,
timer function or otherwise, defeating the very purpose of calling
pm_runtime_resume(dev) from there.  To prevent that from happening
it is necessary to increment the runtime PM usage counter of the
device by replacing pm_runtime_resume() with pm_runtime_get_sync().
Moreover, the incremented runtime PM usage counter has to be
decremented by the corresponding pci_pm_complete(), via
pm_runtime_put_noidle().

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: stable@kernel.org
---
 drivers/pci/pci-driver.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -624,7 +624,7 @@ static int pci_pm_prepare(struct device
 	 * system from the sleep state, we'll have to prevent it from signaling
 	 * wake-up.
 	 */
-	pm_runtime_resume(dev);
+	pm_runtime_get_sync(dev);

 	if (drv && drv->pm && drv->pm->prepare)
 		error = drv->pm->prepare(dev);
@@ -638,6 +638,8 @@ static void pci_pm_complete(struct devic

 	if (drv && drv->pm && drv->pm->complete)
 		drv->pm->complete(dev);
+
+	pm_runtime_put_noidle(dev);
 }

 #else /* !CONFIG_PM_SLEEP */

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-19 19:49 [PATCH] PCI / PM: Block races between runtime PM and system sleep Rafael J. Wysocki
  2011-06-20 14:46 ` Alan Stern
@ 2011-06-20 14:46 ` Alan Stern
  2011-06-20 19:42   ` Rafael J. Wysocki
  2011-06-20 19:42   ` Rafael J. Wysocki
  1 sibling, 2 replies; 36+ messages in thread
From: Alan Stern @ 2011-06-20 14:46 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PM mailing list, LKML, Jesse Barnes, linux-pci

On Sun, 19 Jun 2011, Rafael J. Wysocki wrote:

> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> After commit e8665002477f0278f84f898145b1f141ba26ee26
> (PM: Allow pm_runtime_suspend() to succeed during system suspend) it
> is possible that a device resumed by the pm_runtime_resume(dev) in
> pci_pm_prepare() will be suspended immediately from a work item,
> timer function or otherwise, defeating the very purpose of calling
> pm_runtime_resume(dev) from there.  To prevent that from happening
> it is necessary to increment the runtime PM usage counter of the
> device by replacing pm_runtime_resume() with pm_runtime_get_sync().
> Moreover, the incremented runtime PM usage counter has to be
> decremented by the corresponding pci_pm_complete(), via
> pm_runtime_put_noidle().

In both this and the previous patch, the final decrement should be done 
by pm_runtime_put_sync() instead of pm_runtime_put_idle().  Otherwise 
you face the possibility that the usage_count may go to 0 but the 
device will be left active.

Furthermore, since we're going to disable runtime PM as soon as the
suspend callback returns anyway, why not increment usage_count before
invoking the callback?  This will prevent runtime suspends from 
occurring while the callback runs, so no changes will be needed in the 
PCI or USB subsystems.

It also will prevent Kevin from calling pm_runtime_suspend from within
his suspend callbacks, but you have already determined that subsystems
and drivers should never do that in any case.

Alan Stern

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-19 19:49 [PATCH] PCI / PM: Block races between runtime PM and system sleep Rafael J. Wysocki
@ 2011-06-20 14:46 ` Alan Stern
  2011-06-20 14:46 ` Alan Stern
  1 sibling, 0 replies; 36+ messages in thread
From: Alan Stern @ 2011-06-20 14:46 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-pci, Linux PM mailing list, LKML, Jesse Barnes

On Sun, 19 Jun 2011, Rafael J. Wysocki wrote:

> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> After commit e8665002477f0278f84f898145b1f141ba26ee26
> (PM: Allow pm_runtime_suspend() to succeed during system suspend) it
> is possible that a device resumed by the pm_runtime_resume(dev) in
> pci_pm_prepare() will be suspended immediately from a work item,
> timer function or otherwise, defeating the very purpose of calling
> pm_runtime_resume(dev) from there.  To prevent that from happening
> it is necessary to increment the runtime PM usage counter of the
> device by replacing pm_runtime_resume() with pm_runtime_get_sync().
> Moreover, the incremented runtime PM usage counter has to be
> decremented by the corresponding pci_pm_complete(), via
> pm_runtime_put_noidle().

In both this and the previous patch, the final decrement should be done 
by pm_runtime_put_sync() instead of pm_runtime_put_idle().  Otherwise 
you face the possibility that the usage_count may go to 0 but the 
device will be left active.

Furthermore, since we're going to disable runtime PM as soon as the
suspend callback returns anyway, why not increment usage_count before
invoking the callback?  This will prevent runtime suspends from 
occurring while the callback runs, so no changes will be needed in the 
PCI or USB subsystems.

It also will prevent Kevin from calling pm_runtime_suspend from within
his suspend callbacks, but you have already determined that subsystems
and drivers should never do that in any case.

Alan Stern

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-20 14:46 ` Alan Stern
  2011-06-20 19:42   ` Rafael J. Wysocki
@ 2011-06-20 19:42   ` Rafael J. Wysocki
  2011-06-20 21:00     ` Alan Stern
  2011-06-20 21:00     ` Alan Stern
  1 sibling, 2 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-06-20 19:42 UTC (permalink / raw)
  To: Alan Stern; +Cc: Linux PM mailing list, LKML, Jesse Barnes, linux-pci

On Monday, June 20, 2011, Alan Stern wrote:
> On Sun, 19 Jun 2011, Rafael J. Wysocki wrote:
> 
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > After commit e8665002477f0278f84f898145b1f141ba26ee26
> > (PM: Allow pm_runtime_suspend() to succeed during system suspend) it
> > is possible that a device resumed by the pm_runtime_resume(dev) in
> > pci_pm_prepare() will be suspended immediately from a work item,
> > timer function or otherwise, defeating the very purpose of calling
> > pm_runtime_resume(dev) from there.  To prevent that from happening
> > it is necessary to increment the runtime PM usage counter of the
> > device by replacing pm_runtime_resume() with pm_runtime_get_sync().
> > Moreover, the incremented runtime PM usage counter has to be
> > decremented by the corresponding pci_pm_complete(), via
> > pm_runtime_put_noidle().
> 
> In both this and the previous patch, the final decrement should be done 
> by pm_runtime_put_sync() instead of pm_runtime_put_idle().  Otherwise 
> you face the possibility that the usage_count may go to 0 but the 
> device will be left active.

OK, that's how the old code worked, BTW, I overlooked that.

> Furthermore, since we're going to disable runtime PM as soon as the
> suspend callback returns anyway, why not increment usage_count before
> invoking the callback?  This will prevent runtime suspends from 
> occurring while the callback runs, so no changes will be needed in the 
> PCI or USB subsystems.

The PCI case is different from the USB one.  PCI needs to resume devices
before calling their drivers' .suspend() callbacks, so it does that in
.prepare().  If the core acquired a reference to every device  before executing
the subsystem .suspend(), then pm_runtime_resume() could be moved from
pci_pm_prepare() to pci_prepare_suspend(), but then additionally it would
have to be called from pci_pm_freeze() and pci_pm_poweroff().  It simply is
more efficient to call it once from pci_pm_prepare(), but then PCI needs to
take the reference by itself.

Also the core doesn't call the subsystem-level .runtime_idle() after the
subsystem-level .complete() has run, which is useful as you pointed out above. :-)

> It also will prevent Kevin from calling pm_runtime_suspend from within
> his suspend callbacks, but you have already determined that subsystems
> and drivers should never do that in any case.

Then reverting commit e8665002477f0278f84f898145b1f141ba26ee26 would be
even better. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-20 14:46 ` Alan Stern
@ 2011-06-20 19:42   ` Rafael J. Wysocki
  2011-06-20 19:42   ` Rafael J. Wysocki
  1 sibling, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-06-20 19:42 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-pci, Linux PM mailing list, LKML, Jesse Barnes

On Monday, June 20, 2011, Alan Stern wrote:
> On Sun, 19 Jun 2011, Rafael J. Wysocki wrote:
> 
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > After commit e8665002477f0278f84f898145b1f141ba26ee26
> > (PM: Allow pm_runtime_suspend() to succeed during system suspend) it
> > is possible that a device resumed by the pm_runtime_resume(dev) in
> > pci_pm_prepare() will be suspended immediately from a work item,
> > timer function or otherwise, defeating the very purpose of calling
> > pm_runtime_resume(dev) from there.  To prevent that from happening
> > it is necessary to increment the runtime PM usage counter of the
> > device by replacing pm_runtime_resume() with pm_runtime_get_sync().
> > Moreover, the incremented runtime PM usage counter has to be
> > decremented by the corresponding pci_pm_complete(), via
> > pm_runtime_put_noidle().
> 
> In both this and the previous patch, the final decrement should be done 
> by pm_runtime_put_sync() instead of pm_runtime_put_idle().  Otherwise 
> you face the possibility that the usage_count may go to 0 but the 
> device will be left active.

OK, that's how the old code worked, BTW, I overlooked that.

> Furthermore, since we're going to disable runtime PM as soon as the
> suspend callback returns anyway, why not increment usage_count before
> invoking the callback?  This will prevent runtime suspends from 
> occurring while the callback runs, so no changes will be needed in the 
> PCI or USB subsystems.

The PCI case is different from the USB one.  PCI needs to resume devices
before calling their drivers' .suspend() callbacks, so it does that in
.prepare().  If the core acquired a reference to every device  before executing
the subsystem .suspend(), then pm_runtime_resume() could be moved from
pci_pm_prepare() to pci_prepare_suspend(), but then additionally it would
have to be called from pci_pm_freeze() and pci_pm_poweroff().  It simply is
more efficient to call it once from pci_pm_prepare(), but then PCI needs to
take the reference by itself.

Also the core doesn't call the subsystem-level .runtime_idle() after the
subsystem-level .complete() has run, which is useful as you pointed out above. :-)

> It also will prevent Kevin from calling pm_runtime_suspend from within
> his suspend callbacks, but you have already determined that subsystems
> and drivers should never do that in any case.

Then reverting commit e8665002477f0278f84f898145b1f141ba26ee26 would be
even better. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-20 19:42   ` Rafael J. Wysocki
@ 2011-06-20 21:00     ` Alan Stern
  2011-06-20 21:28       ` Rafael J. Wysocki
  2011-06-20 21:28       ` Rafael J. Wysocki
  2011-06-20 21:00     ` Alan Stern
  1 sibling, 2 replies; 36+ messages in thread
From: Alan Stern @ 2011-06-20 21:00 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PM mailing list, LKML, Jesse Barnes, linux-pci

On Mon, 20 Jun 2011, Rafael J. Wysocki wrote:

> > Furthermore, since we're going to disable runtime PM as soon as the
> > suspend callback returns anyway, why not increment usage_count before
> > invoking the callback?  This will prevent runtime suspends from 
> > occurring while the callback runs, so no changes will be needed in the 
> > PCI or USB subsystems.
> 
> The PCI case is different from the USB one.  PCI needs to resume devices
> before calling their drivers' .suspend() callbacks, so it does that in
> .prepare().  If the core acquired a reference to every device  before executing
> the subsystem .suspend(), then pm_runtime_resume() could be moved from
> pci_pm_prepare() to pci_prepare_suspend(), but then additionally it would
> have to be called from pci_pm_freeze() and pci_pm_poweroff().  It simply is
> more efficient to call it once from pci_pm_prepare(), but then PCI needs to
> take the reference by itself.

Ah, okay.  The PCI part makes sense then.

> > It also will prevent Kevin from calling pm_runtime_suspend from within
> > his suspend callbacks, but you have already determined that subsystems
> > and drivers should never do that in any case.
> 
> Then reverting commit e8665002477f0278f84f898145b1f141ba26ee26 would be
> even better. :-)

See below.


> > As I see it, we never want a suspend or suspend_noirq callback to call 
> > pm_runtime_suspend().  However it's okay for the suspend callback to 
> > invoke pm_runtime_resume(), as long as this is all done in subsystem 
> > code.
> 
> First off, I don't really see a reason for a subsystem to call
> pm_runtime_resume() from its .suspend_noirq() callback.

I was referring to .suspend(), not .suspend_noirq().

>  Now, if
> pm_runtime_resume() is to be called concurrently with the subsystem's
> .suspend_noirq() callback, I'd rather won't let that happen. :-)

Me too.  But I see no reason to prevent pm_runtime_resume() from being 
called by .suspend().

> > And in between the prepare and suspend callbacks, runtime PM should be
> > more or less fully functional, right?  For most devices it will never
> > be triggered, because it has to run in process context and both
> > userspace and pm_wq are frozen.  It may trigger for devices marked as
> > IRQ-safe, though.
> 
> It also may trigger for drivers using non-freezable workqueues and calling
> runtime PM synchronously from there.

Right.  So we shouldn't ignore this window.

> > Maybe the barrier should be moved into __device_suspend().
> 
> I _really_ think that the initial approach, i.e. before commit
> e8665002477f0278f84f898145b1f141ba26ee26, made the most sense.  It didn't
> cover the "pm_runtime_resume() called during system suspend" case, but
> it did cover everything else.

But it prevented runtime PM from working during the window between 
.prepare() and .suspend(), and also between .resume() and .complete().
If a subsystem like PCI wants to rule out runtime PM during those 
windows, then fine -- it can do whatever it wants.  But the PM core 
shouldn't do this.

> So, I think there are serious technical arguments for reverting that commit.
> 
> I think we went really far trying to avoid that, but I'm not sure I want to go
> any further.

What I'm suggesting is to revert the commit but at the same time,
move the get_noresume() into __device_suspend() and the put_sync() into 
device_resume().

Alan Stern


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-20 19:42   ` Rafael J. Wysocki
  2011-06-20 21:00     ` Alan Stern
@ 2011-06-20 21:00     ` Alan Stern
  1 sibling, 0 replies; 36+ messages in thread
From: Alan Stern @ 2011-06-20 21:00 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-pci, Linux PM mailing list, LKML, Jesse Barnes

On Mon, 20 Jun 2011, Rafael J. Wysocki wrote:

> > Furthermore, since we're going to disable runtime PM as soon as the
> > suspend callback returns anyway, why not increment usage_count before
> > invoking the callback?  This will prevent runtime suspends from 
> > occurring while the callback runs, so no changes will be needed in the 
> > PCI or USB subsystems.
> 
> The PCI case is different from the USB one.  PCI needs to resume devices
> before calling their drivers' .suspend() callbacks, so it does that in
> .prepare().  If the core acquired a reference to every device  before executing
> the subsystem .suspend(), then pm_runtime_resume() could be moved from
> pci_pm_prepare() to pci_prepare_suspend(), but then additionally it would
> have to be called from pci_pm_freeze() and pci_pm_poweroff().  It simply is
> more efficient to call it once from pci_pm_prepare(), but then PCI needs to
> take the reference by itself.

Ah, okay.  The PCI part makes sense then.

> > It also will prevent Kevin from calling pm_runtime_suspend from within
> > his suspend callbacks, but you have already determined that subsystems
> > and drivers should never do that in any case.
> 
> Then reverting commit e8665002477f0278f84f898145b1f141ba26ee26 would be
> even better. :-)

See below.


> > As I see it, we never want a suspend or suspend_noirq callback to call 
> > pm_runtime_suspend().  However it's okay for the suspend callback to 
> > invoke pm_runtime_resume(), as long as this is all done in subsystem 
> > code.
> 
> First off, I don't really see a reason for a subsystem to call
> pm_runtime_resume() from its .suspend_noirq() callback.

I was referring to .suspend(), not .suspend_noirq().

>  Now, if
> pm_runtime_resume() is to be called concurrently with the subsystem's
> .suspend_noirq() callback, I'd rather won't let that happen. :-)

Me too.  But I see no reason to prevent pm_runtime_resume() from being 
called by .suspend().

> > And in between the prepare and suspend callbacks, runtime PM should be
> > more or less fully functional, right?  For most devices it will never
> > be triggered, because it has to run in process context and both
> > userspace and pm_wq are frozen.  It may trigger for devices marked as
> > IRQ-safe, though.
> 
> It also may trigger for drivers using non-freezable workqueues and calling
> runtime PM synchronously from there.

Right.  So we shouldn't ignore this window.

> > Maybe the barrier should be moved into __device_suspend().
> 
> I _really_ think that the initial approach, i.e. before commit
> e8665002477f0278f84f898145b1f141ba26ee26, made the most sense.  It didn't
> cover the "pm_runtime_resume() called during system suspend" case, but
> it did cover everything else.

But it prevented runtime PM from working during the window between 
.prepare() and .suspend(), and also between .resume() and .complete().
If a subsystem like PCI wants to rule out runtime PM during those 
windows, then fine -- it can do whatever it wants.  But the PM core 
shouldn't do this.

> So, I think there are serious technical arguments for reverting that commit.
> 
> I think we went really far trying to avoid that, but I'm not sure I want to go
> any further.

What I'm suggesting is to revert the commit but at the same time,
move the get_noresume() into __device_suspend() and the put_sync() into 
device_resume().

Alan Stern

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-20 21:00     ` Alan Stern
  2011-06-20 21:28       ` Rafael J. Wysocki
@ 2011-06-20 21:28       ` Rafael J. Wysocki
  2011-06-21 14:52         ` Alan Stern
  2011-06-21 14:52         ` Alan Stern
  1 sibling, 2 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-06-20 21:28 UTC (permalink / raw)
  To: Alan Stern; +Cc: Linux PM mailing list, LKML, Jesse Barnes, linux-pci

On Monday, June 20, 2011, Alan Stern wrote:
> On Mon, 20 Jun 2011, Rafael J. Wysocki wrote:
> 
> > > Furthermore, since we're going to disable runtime PM as soon as the
> > > suspend callback returns anyway, why not increment usage_count before
> > > invoking the callback?  This will prevent runtime suspends from 
> > > occurring while the callback runs, so no changes will be needed in the 
> > > PCI or USB subsystems.
> > 
> > The PCI case is different from the USB one.  PCI needs to resume devices
> > before calling their drivers' .suspend() callbacks, so it does that in
> > .prepare().  If the core acquired a reference to every device  before executing
> > the subsystem .suspend(), then pm_runtime_resume() could be moved from
> > pci_pm_prepare() to pci_prepare_suspend(), but then additionally it would
> > have to be called from pci_pm_freeze() and pci_pm_poweroff().  It simply is
> > more efficient to call it once from pci_pm_prepare(), but then PCI needs to
> > take the reference by itself.
> 
> Ah, okay.  The PCI part makes sense then.

OK, so the appended patch is a modification of the $subject one using
pm_runtime_put_sync() instead of pm_runtime_put_noidle().

> > > It also will prevent Kevin from calling pm_runtime_suspend from within
> > > his suspend callbacks, but you have already determined that subsystems
> > > and drivers should never do that in any case.
> > 
> > Then reverting commit e8665002477f0278f84f898145b1f141ba26ee26 would be
> > even better. :-)
> 
> See below.
> 
> 
> > > As I see it, we never want a suspend or suspend_noirq callback to call 
> > > pm_runtime_suspend().  However it's okay for the suspend callback to 
> > > invoke pm_runtime_resume(), as long as this is all done in subsystem 
> > > code.
> > 
> > First off, I don't really see a reason for a subsystem to call
> > pm_runtime_resume() from its .suspend_noirq() callback.
> 
> I was referring to .suspend(), not .suspend_noirq().
> 
> >  Now, if
> > pm_runtime_resume() is to be called concurrently with the subsystem's
> > .suspend_noirq() callback, I'd rather won't let that happen. :-)
> 
> Me too.  But I see no reason to prevent pm_runtime_resume() from being 
> called by .suspend().

Hmm.

> > > And in between the prepare and suspend callbacks, runtime PM should be
> > > more or less fully functional, right?  For most devices it will never
> > > be triggered, because it has to run in process context and both
> > > userspace and pm_wq are frozen.  It may trigger for devices marked as
> > > IRQ-safe, though.
> > 
> > It also may trigger for drivers using non-freezable workqueues and calling
> > runtime PM synchronously from there.
> 
> Right.  So we shouldn't ignore this window.

So, your point is that while .suspend() or .resume() are running, the
synchronization between runtime PM and system suspend/resume should be the
subsystem's problem, right?

> > > Maybe the barrier should be moved into __device_suspend().
> > 
> > I _really_ think that the initial approach, i.e. before commit
> > e8665002477f0278f84f898145b1f141ba26ee26, made the most sense.  It didn't
> > cover the "pm_runtime_resume() called during system suspend" case, but
> > it did cover everything else.
> 
> But it prevented runtime PM from working during the window between 
> .prepare() and .suspend(), and also between .resume() and .complete().
> If a subsystem like PCI wants to rule out runtime PM during those 
> windows, then fine -- it can do whatever it wants.  But the PM core 
> shouldn't do this.

I actually see a reason for doing this.  Namely, I don't really think
driver writers should be bothered with preventing races between different
PM callbacks from happening.  Runtime PM takes care of that at run time,
the design of the system suspend/resume code ensures that the callbacks
for the same device are executed sequentially, but if we allow runtime PM
callbacks to be executed in parallel with system suspend/resume callbacks,
someone has to prevent those callbacks from racing with each other.

Now, if you agree that that shouldn't be a driver's task, then it has to
be the subsystem's one and I'm not sure what a subsystem can do other than
disabling runtime PM or at least taking a reference on every device before
calling device drivers' .suspend() callbacks.

Please note, I think that .prepare() and .complete() are somewhat special,
so perhaps we should allow those to race with runtime PM callbacks, but IMO
allowing .suspend() and .resume() to race with .runtime_suspend() and
.runtime_resume() is not a good idea.

> > So, I think there are serious technical arguments for reverting that commit.
> > 
> > I think we went really far trying to avoid that, but I'm not sure I want to go
> > any further.
> 
> What I'm suggesting is to revert the commit but at the same time,
> move the get_noresume() into __device_suspend() and the put_sync() into 
> device_resume().

What about doing pm_runtime_get_noresume() and the pm_runtime_barrier()
in dpm_prepare(), but _after_ calling device_prepare() and doing
pm_runtime_put_noidle() in dpm_complete() _before_ calling .complete()
from the subsystem (a _put_sync() at this point will likely invoke
.runtime_idle() from the subsystem before executing .complete(), which may
not be desirable)?

Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PCI / PM: Block races between runtime PM and system sleep

After commit e8665002477f0278f84f898145b1f141ba26ee26
(PM: Allow pm_runtime_suspend() to succeed during system suspend) it
is possible that a device resumed by the pm_runtime_resume(dev) in
pci_pm_prepare() will be suspended immediately from a work item,
timer function or otherwise, defeating the very purpose of calling
pm_runtime_resume(dev) from there.  To prevent that from happening
it is necessary to increment the runtime PM usage counter of the
device by replacing pm_runtime_resume() with pm_runtime_get_sync().
Moreover, the incremented runtime PM usage counter has to be
decremented by the corresponding pci_pm_complete(), via
pm_runtime_put_sync().

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: stable@kernel.org
---
 drivers/pci/pci-driver.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -624,7 +624,7 @@ static int pci_pm_prepare(struct device
 	 * system from the sleep state, we'll have to prevent it from signaling
 	 * wake-up.
 	 */
-	pm_runtime_resume(dev);
+	pm_runtime_get_sync(dev);

 	if (drv && drv->pm && drv->pm->prepare)
 		error = drv->pm->prepare(dev);
@@ -638,6 +638,8 @@ static void pci_pm_complete(struct devic

 	if (drv && drv->pm && drv->pm->complete)
 		drv->pm->complete(dev);
+
+	pm_runtime_put_sync(dev);
 }

 #else /* !CONFIG_PM_SLEEP */

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-20 21:00     ` Alan Stern
@ 2011-06-20 21:28       ` Rafael J. Wysocki
  2011-06-20 21:28       ` Rafael J. Wysocki
  1 sibling, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-06-20 21:28 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-pci, Linux PM mailing list, LKML, Jesse Barnes

On Monday, June 20, 2011, Alan Stern wrote:
> On Mon, 20 Jun 2011, Rafael J. Wysocki wrote:
> 
> > > Furthermore, since we're going to disable runtime PM as soon as the
> > > suspend callback returns anyway, why not increment usage_count before
> > > invoking the callback?  This will prevent runtime suspends from 
> > > occurring while the callback runs, so no changes will be needed in the 
> > > PCI or USB subsystems.
> > 
> > The PCI case is different from the USB one.  PCI needs to resume devices
> > before calling their drivers' .suspend() callbacks, so it does that in
> > .prepare().  If the core acquired a reference to every device  before executing
> > the subsystem .suspend(), then pm_runtime_resume() could be moved from
> > pci_pm_prepare() to pci_prepare_suspend(), but then additionally it would
> > have to be called from pci_pm_freeze() and pci_pm_poweroff().  It simply is
> > more efficient to call it once from pci_pm_prepare(), but then PCI needs to
> > take the reference by itself.
> 
> Ah, okay.  The PCI part makes sense then.

OK, so the appended patch is a modification of the $subject one using
pm_runtime_put_sync() instead of pm_runtime_put_noidle().

> > > It also will prevent Kevin from calling pm_runtime_suspend from within
> > > his suspend callbacks, but you have already determined that subsystems
> > > and drivers should never do that in any case.
> > 
> > Then reverting commit e8665002477f0278f84f898145b1f141ba26ee26 would be
> > even better. :-)
> 
> See below.
> 
> 
> > > As I see it, we never want a suspend or suspend_noirq callback to call 
> > > pm_runtime_suspend().  However it's okay for the suspend callback to 
> > > invoke pm_runtime_resume(), as long as this is all done in subsystem 
> > > code.
> > 
> > First off, I don't really see a reason for a subsystem to call
> > pm_runtime_resume() from its .suspend_noirq() callback.
> 
> I was referring to .suspend(), not .suspend_noirq().
> 
> >  Now, if
> > pm_runtime_resume() is to be called concurrently with the subsystem's
> > .suspend_noirq() callback, I'd rather won't let that happen. :-)
> 
> Me too.  But I see no reason to prevent pm_runtime_resume() from being 
> called by .suspend().

Hmm.

> > > And in between the prepare and suspend callbacks, runtime PM should be
> > > more or less fully functional, right?  For most devices it will never
> > > be triggered, because it has to run in process context and both
> > > userspace and pm_wq are frozen.  It may trigger for devices marked as
> > > IRQ-safe, though.
> > 
> > It also may trigger for drivers using non-freezable workqueues and calling
> > runtime PM synchronously from there.
> 
> Right.  So we shouldn't ignore this window.

So, your point is that while .suspend() or .resume() are running, the
synchronization between runtime PM and system suspend/resume should be the
subsystem's problem, right?

> > > Maybe the barrier should be moved into __device_suspend().
> > 
> > I _really_ think that the initial approach, i.e. before commit
> > e8665002477f0278f84f898145b1f141ba26ee26, made the most sense.  It didn't
> > cover the "pm_runtime_resume() called during system suspend" case, but
> > it did cover everything else.
> 
> But it prevented runtime PM from working during the window between 
> .prepare() and .suspend(), and also between .resume() and .complete().
> If a subsystem like PCI wants to rule out runtime PM during those 
> windows, then fine -- it can do whatever it wants.  But the PM core 
> shouldn't do this.

I actually see a reason for doing this.  Namely, I don't really think
driver writers should be bothered with preventing races between different
PM callbacks from happening.  Runtime PM takes care of that at run time,
the design of the system suspend/resume code ensures that the callbacks
for the same device are executed sequentially, but if we allow runtime PM
callbacks to be executed in parallel with system suspend/resume callbacks,
someone has to prevent those callbacks from racing with each other.

Now, if you agree that that shouldn't be a driver's task, then it has to
be the subsystem's one and I'm not sure what a subsystem can do other than
disabling runtime PM or at least taking a reference on every device before
calling device drivers' .suspend() callbacks.

Please note, I think that .prepare() and .complete() are somewhat special,
so perhaps we should allow those to race with runtime PM callbacks, but IMO
allowing .suspend() and .resume() to race with .runtime_suspend() and
.runtime_resume() is not a good idea.

> > So, I think there are serious technical arguments for reverting that commit.
> > 
> > I think we went really far trying to avoid that, but I'm not sure I want to go
> > any further.
> 
> What I'm suggesting is to revert the commit but at the same time,
> move the get_noresume() into __device_suspend() and the put_sync() into 
> device_resume().

What about doing pm_runtime_get_noresume() and the pm_runtime_barrier()
in dpm_prepare(), but _after_ calling device_prepare() and doing
pm_runtime_put_noidle() in dpm_complete() _before_ calling .complete()
from the subsystem (a _put_sync() at this point will likely invoke
.runtime_idle() from the subsystem before executing .complete(), which may
not be desirable)?

Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PCI / PM: Block races between runtime PM and system sleep

After commit e8665002477f0278f84f898145b1f141ba26ee26
(PM: Allow pm_runtime_suspend() to succeed during system suspend) it
is possible that a device resumed by the pm_runtime_resume(dev) in
pci_pm_prepare() will be suspended immediately from a work item,
timer function or otherwise, defeating the very purpose of calling
pm_runtime_resume(dev) from there.  To prevent that from happening
it is necessary to increment the runtime PM usage counter of the
device by replacing pm_runtime_resume() with pm_runtime_get_sync().
Moreover, the incremented runtime PM usage counter has to be
decremented by the corresponding pci_pm_complete(), via
pm_runtime_put_sync().

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: stable@kernel.org
---
 drivers/pci/pci-driver.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -624,7 +624,7 @@ static int pci_pm_prepare(struct device
 	 * system from the sleep state, we'll have to prevent it from signaling
 	 * wake-up.
 	 */
-	pm_runtime_resume(dev);
+	pm_runtime_get_sync(dev);

 	if (drv && drv->pm && drv->pm->prepare)
 		error = drv->pm->prepare(dev);
@@ -638,6 +638,8 @@ static void pci_pm_complete(struct devic

 	if (drv && drv->pm && drv->pm->complete)
 		drv->pm->complete(dev);
+
+	pm_runtime_put_sync(dev);
 }

 #else /* !CONFIG_PM_SLEEP */

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-20 21:28       ` Rafael J. Wysocki
  2011-06-21 14:52         ` Alan Stern
@ 2011-06-21 14:52         ` Alan Stern
  2011-06-21 23:49           ` Rafael J. Wysocki
  2011-06-21 23:49           ` Rafael J. Wysocki
  1 sibling, 2 replies; 36+ messages in thread
From: Alan Stern @ 2011-06-21 14:52 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PM mailing list, LKML, Jesse Barnes, linux-pci

On Mon, 20 Jun 2011, Rafael J. Wysocki wrote:

> > Ah, okay.  The PCI part makes sense then.
> 
> OK, so the appended patch is a modification of the $subject one using
> pm_runtime_put_sync() instead of pm_runtime_put_noidle().

Yes, it looks good.

> So, your point is that while .suspend() or .resume() are running, the
> synchronization between runtime PM and system suspend/resume should be the
> subsystem's problem, right?

Almost but not quite.  I was talking about the time period between 
.prepare() and .suspend() (and also the time period between .resume() 
and .complete()).

It's probably okay to prevent pm_runtime_suspend() from working during
.suspend() or .resume(), but it's not a good idea to prevent
pm_runtime_resume() from working then.

> I actually see a reason for doing this.  Namely, I don't really think
> driver writers should be bothered with preventing races between different
> PM callbacks from happening.  Runtime PM takes care of that at run time,
> the design of the system suspend/resume code ensures that the callbacks
> for the same device are executed sequentially, but if we allow runtime PM
> callbacks to be executed in parallel with system suspend/resume callbacks,
> someone has to prevent those callbacks from racing with each other.
> 
> Now, if you agree that that shouldn't be a driver's task, then it has to
> be the subsystem's one and I'm not sure what a subsystem can do other than
> disabling runtime PM or at least taking a reference on every device before
> calling device drivers' .suspend() callbacks.
> 
> Please note, I think that .prepare() and .complete() are somewhat special,
> so perhaps we should allow those to race with runtime PM callbacks, but IMO
> allowing .suspend() and .resume() to race with .runtime_suspend() and
> .runtime_resume() is not a good idea.

Races in the period after .suspend() and before .resume() will be 
handled by disabling runtime PM when .suspend() returns and enabling it 
before calling .resume().

During the .suspend and .resume callbacks, races with
.runtime_suspend() can be prevented by calling
pm_runtime_get_noresume() just before .suspend() and then calling
pm_runtime_put_sync() just after .resume().

Races with .runtime_resume() can be handled to some extent by putting a
runtime barrier immediately after the pm_runtime_get_noresume() call,
but that's not a perfect solution.  Is it good enough?

> > What I'm suggesting is to revert the commit but at the same time,
> > move the get_noresume() into __device_suspend() and the put_sync() into 
> > device_resume().
> 
> What about doing pm_runtime_get_noresume() and the pm_runtime_barrier()
> in dpm_prepare(), but _after_ calling device_prepare() and doing
> pm_runtime_put_noidle() in dpm_complete() _before_ calling .complete()
> from the subsystem

This does not address the issue of allowing runtime suspends in the 
windows between .prepare() - .suspend() and .resume() - .complete().

>  (a _put_sync() at this point will likely invoke
> .runtime_idle() from the subsystem before executing .complete(), which may
> not be desirable)?

It should be allowed.  The purpose of .complete() is not to re-enable
runtime power management of the device; it is to release resources
(like memory) allocated during .prepare() and perhaps also to allow new
children to be registered under the device.

Alan Stern

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-20 21:28       ` Rafael J. Wysocki
@ 2011-06-21 14:52         ` Alan Stern
  2011-06-21 14:52         ` Alan Stern
  1 sibling, 0 replies; 36+ messages in thread
From: Alan Stern @ 2011-06-21 14:52 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-pci, Linux PM mailing list, LKML, Jesse Barnes

On Mon, 20 Jun 2011, Rafael J. Wysocki wrote:

> > Ah, okay.  The PCI part makes sense then.
> 
> OK, so the appended patch is a modification of the $subject one using
> pm_runtime_put_sync() instead of pm_runtime_put_noidle().

Yes, it looks good.

> So, your point is that while .suspend() or .resume() are running, the
> synchronization between runtime PM and system suspend/resume should be the
> subsystem's problem, right?

Almost but not quite.  I was talking about the time period between 
.prepare() and .suspend() (and also the time period between .resume() 
and .complete()).

It's probably okay to prevent pm_runtime_suspend() from working during
.suspend() or .resume(), but it's not a good idea to prevent
pm_runtime_resume() from working then.

> I actually see a reason for doing this.  Namely, I don't really think
> driver writers should be bothered with preventing races between different
> PM callbacks from happening.  Runtime PM takes care of that at run time,
> the design of the system suspend/resume code ensures that the callbacks
> for the same device are executed sequentially, but if we allow runtime PM
> callbacks to be executed in parallel with system suspend/resume callbacks,
> someone has to prevent those callbacks from racing with each other.
> 
> Now, if you agree that that shouldn't be a driver's task, then it has to
> be the subsystem's one and I'm not sure what a subsystem can do other than
> disabling runtime PM or at least taking a reference on every device before
> calling device drivers' .suspend() callbacks.
> 
> Please note, I think that .prepare() and .complete() are somewhat special,
> so perhaps we should allow those to race with runtime PM callbacks, but IMO
> allowing .suspend() and .resume() to race with .runtime_suspend() and
> .runtime_resume() is not a good idea.

Races in the period after .suspend() and before .resume() will be 
handled by disabling runtime PM when .suspend() returns and enabling it 
before calling .resume().

During the .suspend and .resume callbacks, races with
.runtime_suspend() can be prevented by calling
pm_runtime_get_noresume() just before .suspend() and then calling
pm_runtime_put_sync() just after .resume().

Races with .runtime_resume() can be handled to some extent by putting a
runtime barrier immediately after the pm_runtime_get_noresume() call,
but that's not a perfect solution.  Is it good enough?

> > What I'm suggesting is to revert the commit but at the same time,
> > move the get_noresume() into __device_suspend() and the put_sync() into 
> > device_resume().
> 
> What about doing pm_runtime_get_noresume() and the pm_runtime_barrier()
> in dpm_prepare(), but _after_ calling device_prepare() and doing
> pm_runtime_put_noidle() in dpm_complete() _before_ calling .complete()
> from the subsystem

This does not address the issue of allowing runtime suspends in the 
windows between .prepare() - .suspend() and .resume() - .complete().

>  (a _put_sync() at this point will likely invoke
> .runtime_idle() from the subsystem before executing .complete(), which may
> not be desirable)?

It should be allowed.  The purpose of .complete() is not to re-enable
runtime power management of the device; it is to release resources
(like memory) allocated during .prepare() and perhaps also to allow new
children to be registered under the device.

Alan Stern

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-21 14:52         ` Alan Stern
  2011-06-21 23:49           ` Rafael J. Wysocki
@ 2011-06-21 23:49           ` Rafael J. Wysocki
  2011-06-22 14:20             ` Alan Stern
  2011-06-22 14:20             ` Alan Stern
  1 sibling, 2 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-06-21 23:49 UTC (permalink / raw)
  To: Alan Stern; +Cc: Linux PM mailing list, LKML, Jesse Barnes, linux-pci

On Tuesday, June 21, 2011, Alan Stern wrote:
> On Mon, 20 Jun 2011, Rafael J. Wysocki wrote:
> 
> > > Ah, okay.  The PCI part makes sense then.
> > 
> > OK, so the appended patch is a modification of the $subject one using
> > pm_runtime_put_sync() instead of pm_runtime_put_noidle().
> 
> Yes, it looks good.

Cool, thanks!

> > So, your point is that while .suspend() or .resume() are running, the
> > synchronization between runtime PM and system suspend/resume should be the
> > subsystem's problem, right?
> 
> Almost but not quite.  I was talking about the time period between 
> .prepare() and .suspend() (and also the time period between .resume() 
> and .complete()).
> 
> It's probably okay to prevent pm_runtime_suspend() from working during
> .suspend() or .resume(), but it's not a good idea to prevent
> pm_runtime_resume() from working then.

OK, but taking a reference by means of pm_runtime_get_noresume() won't
block pm_runtime_resume().

> > I actually see a reason for doing this.  Namely, I don't really think
> > driver writers should be bothered with preventing races between different
> > PM callbacks from happening.  Runtime PM takes care of that at run time,
> > the design of the system suspend/resume code ensures that the callbacks
> > for the same device are executed sequentially, but if we allow runtime PM
> > callbacks to be executed in parallel with system suspend/resume callbacks,
> > someone has to prevent those callbacks from racing with each other.
> > 
> > Now, if you agree that that shouldn't be a driver's task, then it has to
> > be the subsystem's one and I'm not sure what a subsystem can do other than
> > disabling runtime PM or at least taking a reference on every device before
> > calling device drivers' .suspend() callbacks.
> > 
> > Please note, I think that .prepare() and .complete() are somewhat special,
> > so perhaps we should allow those to race with runtime PM callbacks, but IMO
> > allowing .suspend() and .resume() to race with .runtime_suspend() and
> > .runtime_resume() is not a good idea.
> 
> Races in the period after .suspend() and before .resume() will be 
> handled by disabling runtime PM when .suspend() returns and enabling it 
> before calling .resume().

OK

> During the .suspend and .resume callbacks, races with
> .runtime_suspend() can be prevented by calling
> pm_runtime_get_noresume() just before .suspend() and then calling
> pm_runtime_put_sync() just after .resume().

So, you seem to suggest to call pm_runtime_get_noresume() in
__device_suspend() and pm_runtime_put_sync() in device_resume().
That would be fine by me, perhaps up to the "sync" part of the "put".

> Races with .runtime_resume() can be handled to some extent by putting a
> runtime barrier immediately after the pm_runtime_get_noresume() call,
> but that's not a perfect solution.  Is it good enough?

It's not worse than what we had before, so I guess it should be enough.

> > > What I'm suggesting is to revert the commit but at the same time,
> > > move the get_noresume() into __device_suspend() and the put_sync() into 
> > > device_resume().
> > 
> > What about doing pm_runtime_get_noresume() and the pm_runtime_barrier()
> > in dpm_prepare(), but _after_ calling device_prepare() and doing
> > pm_runtime_put_noidle() in dpm_complete() _before_ calling .complete()
> > from the subsystem
> 
> This does not address the issue of allowing runtime suspends in the 
> windows between .prepare() - .suspend() and .resume() - .complete().

OK

> >  (a _put_sync() at this point will likely invoke
> > .runtime_idle() from the subsystem before executing .complete(), which may
> > not be desirable)?
> 
> It should be allowed.  The purpose of .complete() is not to re-enable
> runtime power management of the device; it is to release resources
> (like memory) allocated during .prepare() and perhaps also to allow new
> children to be registered under the device.

Right.  But does "allowed" mean the core _should_ do it at this point?
We may as well call pm_runtime_idle() directly from rpm_complete(), but
perhaps it's better to call it from device_resume(), so that it runs in
parallel for async devices.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-21 14:52         ` Alan Stern
@ 2011-06-21 23:49           ` Rafael J. Wysocki
  2011-06-21 23:49           ` Rafael J. Wysocki
  1 sibling, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-06-21 23:49 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-pci, Linux PM mailing list, LKML, Jesse Barnes

On Tuesday, June 21, 2011, Alan Stern wrote:
> On Mon, 20 Jun 2011, Rafael J. Wysocki wrote:
> 
> > > Ah, okay.  The PCI part makes sense then.
> > 
> > OK, so the appended patch is a modification of the $subject one using
> > pm_runtime_put_sync() instead of pm_runtime_put_noidle().
> 
> Yes, it looks good.

Cool, thanks!

> > So, your point is that while .suspend() or .resume() are running, the
> > synchronization between runtime PM and system suspend/resume should be the
> > subsystem's problem, right?
> 
> Almost but not quite.  I was talking about the time period between 
> .prepare() and .suspend() (and also the time period between .resume() 
> and .complete()).
> 
> It's probably okay to prevent pm_runtime_suspend() from working during
> .suspend() or .resume(), but it's not a good idea to prevent
> pm_runtime_resume() from working then.

OK, but taking a reference by means of pm_runtime_get_noresume() won't
block pm_runtime_resume().

> > I actually see a reason for doing this.  Namely, I don't really think
> > driver writers should be bothered with preventing races between different
> > PM callbacks from happening.  Runtime PM takes care of that at run time,
> > the design of the system suspend/resume code ensures that the callbacks
> > for the same device are executed sequentially, but if we allow runtime PM
> > callbacks to be executed in parallel with system suspend/resume callbacks,
> > someone has to prevent those callbacks from racing with each other.
> > 
> > Now, if you agree that that shouldn't be a driver's task, then it has to
> > be the subsystem's one and I'm not sure what a subsystem can do other than
> > disabling runtime PM or at least taking a reference on every device before
> > calling device drivers' .suspend() callbacks.
> > 
> > Please note, I think that .prepare() and .complete() are somewhat special,
> > so perhaps we should allow those to race with runtime PM callbacks, but IMO
> > allowing .suspend() and .resume() to race with .runtime_suspend() and
> > .runtime_resume() is not a good idea.
> 
> Races in the period after .suspend() and before .resume() will be 
> handled by disabling runtime PM when .suspend() returns and enabling it 
> before calling .resume().

OK

> During the .suspend and .resume callbacks, races with
> .runtime_suspend() can be prevented by calling
> pm_runtime_get_noresume() just before .suspend() and then calling
> pm_runtime_put_sync() just after .resume().

So, you seem to suggest to call pm_runtime_get_noresume() in
__device_suspend() and pm_runtime_put_sync() in device_resume().
That would be fine by me, perhaps up to the "sync" part of the "put".

> Races with .runtime_resume() can be handled to some extent by putting a
> runtime barrier immediately after the pm_runtime_get_noresume() call,
> but that's not a perfect solution.  Is it good enough?

It's not worse than what we had before, so I guess it should be enough.

> > > What I'm suggesting is to revert the commit but at the same time,
> > > move the get_noresume() into __device_suspend() and the put_sync() into 
> > > device_resume().
> > 
> > What about doing pm_runtime_get_noresume() and the pm_runtime_barrier()
> > in dpm_prepare(), but _after_ calling device_prepare() and doing
> > pm_runtime_put_noidle() in dpm_complete() _before_ calling .complete()
> > from the subsystem
> 
> This does not address the issue of allowing runtime suspends in the 
> windows between .prepare() - .suspend() and .resume() - .complete().

OK

> >  (a _put_sync() at this point will likely invoke
> > .runtime_idle() from the subsystem before executing .complete(), which may
> > not be desirable)?
> 
> It should be allowed.  The purpose of .complete() is not to re-enable
> runtime power management of the device; it is to release resources
> (like memory) allocated during .prepare() and perhaps also to allow new
> children to be registered under the device.

Right.  But does "allowed" mean the core _should_ do it at this point?
We may as well call pm_runtime_idle() directly from rpm_complete(), but
perhaps it's better to call it from device_resume(), so that it runs in
parallel for async devices.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-21 23:49           ` Rafael J. Wysocki
  2011-06-22 14:20             ` Alan Stern
@ 2011-06-22 14:20             ` Alan Stern
  2011-06-23 17:46               ` Rafael J. Wysocki
  2011-06-23 17:46               ` Rafael J. Wysocki
  1 sibling, 2 replies; 36+ messages in thread
From: Alan Stern @ 2011-06-22 14:20 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PM mailing list, LKML, Jesse Barnes, linux-pci

On Wed, 22 Jun 2011, Rafael J. Wysocki wrote:

> > It's probably okay to prevent pm_runtime_suspend() from working during
> > .suspend() or .resume(), but it's not a good idea to prevent
> > pm_runtime_resume() from working then.
> 
> OK, but taking a reference by means of pm_runtime_get_noresume() won't
> block pm_runtime_resume().

Exactly my point -- we don't need to (and don't want to) block 
pm_runtime_resume during the .suspend() and .resume() callbacks.

> > During the .suspend and .resume callbacks, races with
> > .runtime_suspend() can be prevented by calling
> > pm_runtime_get_noresume() just before .suspend() and then calling
> > pm_runtime_put_sync() just after .resume().
> 
> So, you seem to suggest to call pm_runtime_get_noresume() in
> __device_suspend() and pm_runtime_put_sync() in device_resume().

Yes.  Also perhaps call pm_runtime_barrier() immediately after 
get_noresume.

> That would be fine by me, perhaps up to the "sync" part of the "put".

The main feature of this design is that it allows runtime PM to work 
between .resume() and .complete().  If you do a put_noidle instead of 
put_sync then you may prevent runtime PM from working properly.

> > >  (a _put_sync() at this point will likely invoke
> > > .runtime_idle() from the subsystem before executing .complete(), which may
> > > not be desirable)?
> > 
> > It should be allowed.  The purpose of .complete() is not to re-enable
> > runtime power management of the device; it is to release resources
> > (like memory) allocated during .prepare() and perhaps also to allow new
> > children to be registered under the device.
> 
> Right.  But does "allowed" mean the core _should_ do it at this point?
> We may as well call pm_runtime_idle() directly from rpm_complete(), but
> perhaps it's better to call it from device_resume(), so that it runs in
> parallel for async devices.

Calling pm_runtime_put_noidle() followed by pm_runtime_idle() is 
essentially the same as calling pm_runtime_put_sync() anyway.

If a subsystem really does want to block runtime PM between the 
.resume() and .complete() callbacks, it can do its own get_noresume and 
put_sync -- just as you have done with PCI.

Alan Stern


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-21 23:49           ` Rafael J. Wysocki
@ 2011-06-22 14:20             ` Alan Stern
  2011-06-22 14:20             ` Alan Stern
  1 sibling, 0 replies; 36+ messages in thread
From: Alan Stern @ 2011-06-22 14:20 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-pci, Linux PM mailing list, LKML, Jesse Barnes

On Wed, 22 Jun 2011, Rafael J. Wysocki wrote:

> > It's probably okay to prevent pm_runtime_suspend() from working during
> > .suspend() or .resume(), but it's not a good idea to prevent
> > pm_runtime_resume() from working then.
> 
> OK, but taking a reference by means of pm_runtime_get_noresume() won't
> block pm_runtime_resume().

Exactly my point -- we don't need to (and don't want to) block 
pm_runtime_resume during the .suspend() and .resume() callbacks.

> > During the .suspend and .resume callbacks, races with
> > .runtime_suspend() can be prevented by calling
> > pm_runtime_get_noresume() just before .suspend() and then calling
> > pm_runtime_put_sync() just after .resume().
> 
> So, you seem to suggest to call pm_runtime_get_noresume() in
> __device_suspend() and pm_runtime_put_sync() in device_resume().

Yes.  Also perhaps call pm_runtime_barrier() immediately after 
get_noresume.

> That would be fine by me, perhaps up to the "sync" part of the "put".

The main feature of this design is that it allows runtime PM to work 
between .resume() and .complete().  If you do a put_noidle instead of 
put_sync then you may prevent runtime PM from working properly.

> > >  (a _put_sync() at this point will likely invoke
> > > .runtime_idle() from the subsystem before executing .complete(), which may
> > > not be desirable)?
> > 
> > It should be allowed.  The purpose of .complete() is not to re-enable
> > runtime power management of the device; it is to release resources
> > (like memory) allocated during .prepare() and perhaps also to allow new
> > children to be registered under the device.
> 
> Right.  But does "allowed" mean the core _should_ do it at this point?
> We may as well call pm_runtime_idle() directly from rpm_complete(), but
> perhaps it's better to call it from device_resume(), so that it runs in
> parallel for async devices.

Calling pm_runtime_put_noidle() followed by pm_runtime_idle() is 
essentially the same as calling pm_runtime_put_sync() anyway.

If a subsystem really does want to block runtime PM between the 
.resume() and .complete() callbacks, it can do its own get_noresume and 
put_sync -- just as you have done with PCI.

Alan Stern

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-22 14:20             ` Alan Stern
  2011-06-23 17:46               ` Rafael J. Wysocki
@ 2011-06-23 17:46               ` Rafael J. Wysocki
  2011-06-23 18:35                 ` Alan Stern
  2011-06-23 18:35                 ` Alan Stern
  1 sibling, 2 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-06-23 17:46 UTC (permalink / raw)
  To: Alan Stern; +Cc: Linux PM mailing list, LKML, Jesse Barnes, linux-pci

On Wednesday, June 22, 2011, Alan Stern wrote:
> On Wed, 22 Jun 2011, Rafael J. Wysocki wrote:
> 
> > > It's probably okay to prevent pm_runtime_suspend() from working during
> > > .suspend() or .resume(), but it's not a good idea to prevent
> > > pm_runtime_resume() from working then.
> > 
> > OK, but taking a reference by means of pm_runtime_get_noresume() won't
> > block pm_runtime_resume().
> 
> Exactly my point -- we don't need to (and don't want to) block 
> pm_runtime_resume during the .suspend() and .resume() callbacks.
> 
> > > During the .suspend and .resume callbacks, races with
> > > .runtime_suspend() can be prevented by calling
> > > pm_runtime_get_noresume() just before .suspend() and then calling
> > > pm_runtime_put_sync() just after .resume().
> > 
> > So, you seem to suggest to call pm_runtime_get_noresume() in
> > __device_suspend() and pm_runtime_put_sync() in device_resume().
> 
> Yes.  Also perhaps call pm_runtime_barrier() immediately after 
> get_noresume.
> 
> > That would be fine by me, perhaps up to the "sync" part of the "put".
> 
> The main feature of this design is that it allows runtime PM to work 
> between .resume() and .complete().  If you do a put_noidle instead of 
> put_sync then you may prevent runtime PM from working properly.
> 
> > > >  (a _put_sync() at this point will likely invoke
> > > > .runtime_idle() from the subsystem before executing .complete(), which may
> > > > not be desirable)?
> > > 
> > > It should be allowed.  The purpose of .complete() is not to re-enable
> > > runtime power management of the device; it is to release resources
> > > (like memory) allocated during .prepare() and perhaps also to allow new
> > > children to be registered under the device.
> > 
> > Right.  But does "allowed" mean the core _should_ do it at this point?
> > We may as well call pm_runtime_idle() directly from rpm_complete(), but
> > perhaps it's better to call it from device_resume(), so that it runs in
> > parallel for async devices.
> 
> Calling pm_runtime_put_noidle() followed by pm_runtime_idle() is 
> essentially the same as calling pm_runtime_put_sync() anyway.
> 
> If a subsystem really does want to block runtime PM between the 
> .resume() and .complete() callbacks, it can do its own get_noresume and 
> put_sync -- just as you have done with PCI.

OK, so what about the appended patch (the last hunk is necessary to make the
SCSI error handling work when runtime PM is disabled, it should be a separate
patch)?

Rafael

---
 drivers/base/power/main.c    |   27 ++++++++++++++++++++++-----
 drivers/base/power/runtime.c |   10 ++++++----
 2 files changed, 28 insertions(+), 9 deletions(-)

Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -505,6 +505,7 @@ static int legacy_resume(struct device *
 static int device_resume(struct device *dev, pm_message_t state, bool async)
 {
 	int error = 0;
+	bool put = false;
 
 	TRACE_DEVICE(dev);
 	TRACE_RESUME(0);
@@ -521,6 +522,9 @@ static int device_resume(struct device *
 	if (!dev->power.is_suspended)
 		goto Unlock;
 
+	pm_runtime_enable(dev);
+	put = true;
+
 	if (dev->pwr_domain) {
 		pm_dev_dbg(dev, state, "power domain ");
 		error = pm_op(dev, &dev->pwr_domain->ops, state);
@@ -563,6 +567,10 @@ static int device_resume(struct device *
 	complete_all(&dev->power.completion);
 
 	TRACE_RESUME(error);
+
+	if (put)
+		pm_runtime_put_sync(dev);
+
 	return error;
 }
 
@@ -843,16 +851,22 @@ static int __device_suspend(struct devic
 	int error = 0;
 
 	dpm_wait_for_children(dev, async);
-	device_lock(dev);
 
 	if (async_error)
-		goto Unlock;
+		return 0;
+
+	pm_runtime_get_noresume(dev);
+	if (pm_runtime_barrier(dev) && device_may_wakeup(dev))
+		pm_wakeup_event(dev, 0);
 
 	if (pm_wakeup_pending()) {
+		pm_runtime_put_sync(dev);
 		async_error = -EBUSY;
-		goto Unlock;
+		return 0;
 	}
 
+	device_lock(dev);
+
 	if (dev->pwr_domain) {
 		pm_dev_dbg(dev, state, "power domain ");
 		error = pm_op(dev, &dev->pwr_domain->ops, state);
@@ -890,12 +904,15 @@ static int __device_suspend(struct devic
  End:
 	dev->power.is_suspended = !error;
 
- Unlock:
 	device_unlock(dev);
 	complete_all(&dev->power.completion);
 
-	if (error)
+	if (error) {
+		pm_runtime_put_sync(dev);
 		async_error = error;
+	} else if (dev->power.is_suspended) {
+		__pm_runtime_disable(dev, false);
+	}
 
 	return error;
 }
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- linux-2.6.orig/drivers/base/power/runtime.c
+++ linux-2.6/drivers/base/power/runtime.c
@@ -455,12 +455,14 @@ static int rpm_resume(struct device *dev
 	dev_dbg(dev, "%s flags 0x%x\n", __func__, rpmflags);
 
  repeat:
-	if (dev->power.runtime_error)
+	if (dev->power.runtime_error) {
 		retval = -EINVAL;
-	else if (dev->power.disable_depth > 0)
-		retval = -EAGAIN;
-	if (retval)
 		goto out;
+	} else if (dev->power.disable_depth > 0) {
+		if (!(rpmflags & RPM_GET_PUT))
+			retval = -EAGAIN;
+		goto out;
+	}
 
 	/*
 	 * Other scheduled or pending requests need to be canceled.  Small

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-22 14:20             ` Alan Stern
@ 2011-06-23 17:46               ` Rafael J. Wysocki
  2011-06-23 17:46               ` Rafael J. Wysocki
  1 sibling, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-06-23 17:46 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-pci, Linux PM mailing list, LKML, Jesse Barnes

On Wednesday, June 22, 2011, Alan Stern wrote:
> On Wed, 22 Jun 2011, Rafael J. Wysocki wrote:
> 
> > > It's probably okay to prevent pm_runtime_suspend() from working during
> > > .suspend() or .resume(), but it's not a good idea to prevent
> > > pm_runtime_resume() from working then.
> > 
> > OK, but taking a reference by means of pm_runtime_get_noresume() won't
> > block pm_runtime_resume().
> 
> Exactly my point -- we don't need to (and don't want to) block 
> pm_runtime_resume during the .suspend() and .resume() callbacks.
> 
> > > During the .suspend and .resume callbacks, races with
> > > .runtime_suspend() can be prevented by calling
> > > pm_runtime_get_noresume() just before .suspend() and then calling
> > > pm_runtime_put_sync() just after .resume().
> > 
> > So, you seem to suggest to call pm_runtime_get_noresume() in
> > __device_suspend() and pm_runtime_put_sync() in device_resume().
> 
> Yes.  Also perhaps call pm_runtime_barrier() immediately after 
> get_noresume.
> 
> > That would be fine by me, perhaps up to the "sync" part of the "put".
> 
> The main feature of this design is that it allows runtime PM to work 
> between .resume() and .complete().  If you do a put_noidle instead of 
> put_sync then you may prevent runtime PM from working properly.
> 
> > > >  (a _put_sync() at this point will likely invoke
> > > > .runtime_idle() from the subsystem before executing .complete(), which may
> > > > not be desirable)?
> > > 
> > > It should be allowed.  The purpose of .complete() is not to re-enable
> > > runtime power management of the device; it is to release resources
> > > (like memory) allocated during .prepare() and perhaps also to allow new
> > > children to be registered under the device.
> > 
> > Right.  But does "allowed" mean the core _should_ do it at this point?
> > We may as well call pm_runtime_idle() directly from rpm_complete(), but
> > perhaps it's better to call it from device_resume(), so that it runs in
> > parallel for async devices.
> 
> Calling pm_runtime_put_noidle() followed by pm_runtime_idle() is 
> essentially the same as calling pm_runtime_put_sync() anyway.
> 
> If a subsystem really does want to block runtime PM between the 
> .resume() and .complete() callbacks, it can do its own get_noresume and 
> put_sync -- just as you have done with PCI.

OK, so what about the appended patch (the last hunk is necessary to make the
SCSI error handling work when runtime PM is disabled, it should be a separate
patch)?

Rafael

---
 drivers/base/power/main.c    |   27 ++++++++++++++++++++++-----
 drivers/base/power/runtime.c |   10 ++++++----
 2 files changed, 28 insertions(+), 9 deletions(-)

Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -505,6 +505,7 @@ static int legacy_resume(struct device *
 static int device_resume(struct device *dev, pm_message_t state, bool async)
 {
 	int error = 0;
+	bool put = false;
 
 	TRACE_DEVICE(dev);
 	TRACE_RESUME(0);
@@ -521,6 +522,9 @@ static int device_resume(struct device *
 	if (!dev->power.is_suspended)
 		goto Unlock;
 
+	pm_runtime_enable(dev);
+	put = true;
+
 	if (dev->pwr_domain) {
 		pm_dev_dbg(dev, state, "power domain ");
 		error = pm_op(dev, &dev->pwr_domain->ops, state);
@@ -563,6 +567,10 @@ static int device_resume(struct device *
 	complete_all(&dev->power.completion);
 
 	TRACE_RESUME(error);
+
+	if (put)
+		pm_runtime_put_sync(dev);
+
 	return error;
 }
 
@@ -843,16 +851,22 @@ static int __device_suspend(struct devic
 	int error = 0;
 
 	dpm_wait_for_children(dev, async);
-	device_lock(dev);
 
 	if (async_error)
-		goto Unlock;
+		return 0;
+
+	pm_runtime_get_noresume(dev);
+	if (pm_runtime_barrier(dev) && device_may_wakeup(dev))
+		pm_wakeup_event(dev, 0);
 
 	if (pm_wakeup_pending()) {
+		pm_runtime_put_sync(dev);
 		async_error = -EBUSY;
-		goto Unlock;
+		return 0;
 	}
 
+	device_lock(dev);
+
 	if (dev->pwr_domain) {
 		pm_dev_dbg(dev, state, "power domain ");
 		error = pm_op(dev, &dev->pwr_domain->ops, state);
@@ -890,12 +904,15 @@ static int __device_suspend(struct devic
  End:
 	dev->power.is_suspended = !error;
 
- Unlock:
 	device_unlock(dev);
 	complete_all(&dev->power.completion);
 
-	if (error)
+	if (error) {
+		pm_runtime_put_sync(dev);
 		async_error = error;
+	} else if (dev->power.is_suspended) {
+		__pm_runtime_disable(dev, false);
+	}
 
 	return error;
 }
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- linux-2.6.orig/drivers/base/power/runtime.c
+++ linux-2.6/drivers/base/power/runtime.c
@@ -455,12 +455,14 @@ static int rpm_resume(struct device *dev
 	dev_dbg(dev, "%s flags 0x%x\n", __func__, rpmflags);
 
  repeat:
-	if (dev->power.runtime_error)
+	if (dev->power.runtime_error) {
 		retval = -EINVAL;
-	else if (dev->power.disable_depth > 0)
-		retval = -EAGAIN;
-	if (retval)
 		goto out;
+	} else if (dev->power.disable_depth > 0) {
+		if (!(rpmflags & RPM_GET_PUT))
+			retval = -EAGAIN;
+		goto out;
+	}
 
 	/*
 	 * Other scheduled or pending requests need to be canceled.  Small

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-23 17:46               ` Rafael J. Wysocki
@ 2011-06-23 18:35                 ` Alan Stern
  2011-06-23 20:49                   ` Rafael J. Wysocki
  2011-06-23 20:49                   ` Rafael J. Wysocki
  2011-06-23 18:35                 ` Alan Stern
  1 sibling, 2 replies; 36+ messages in thread
From: Alan Stern @ 2011-06-23 18:35 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PM mailing list, LKML, Jesse Barnes, linux-pci

On Thu, 23 Jun 2011, Rafael J. Wysocki wrote:

> OK, so what about the appended patch (the last hunk is necessary to make the
> SCSI error handling work when runtime PM is disabled, it should be a separate
> patch)?
> 
> Rafael
> 
> ---
>  drivers/base/power/main.c    |   27 ++++++++++++++++++++++-----
>  drivers/base/power/runtime.c |   10 ++++++----
>  2 files changed, 28 insertions(+), 9 deletions(-)
> 
> Index: linux-2.6/drivers/base/power/main.c
> ===================================================================
> --- linux-2.6.orig/drivers/base/power/main.c
> +++ linux-2.6/drivers/base/power/main.c
> @@ -505,6 +505,7 @@ static int legacy_resume(struct device *
>  static int device_resume(struct device *dev, pm_message_t state, bool async)
>  {
>  	int error = 0;
> +	bool put = false;
>  
>  	TRACE_DEVICE(dev);
>  	TRACE_RESUME(0);
> @@ -521,6 +522,9 @@ static int device_resume(struct device *
>  	if (!dev->power.is_suspended)
>  		goto Unlock;
>  
> +	pm_runtime_enable(dev);
> +	put = true;
> +
>  	if (dev->pwr_domain) {
>  		pm_dev_dbg(dev, state, "power domain ");
>  		error = pm_op(dev, &dev->pwr_domain->ops, state);
> @@ -563,6 +567,10 @@ static int device_resume(struct device *
>  	complete_all(&dev->power.completion);
>  
>  	TRACE_RESUME(error);
> +
> +	if (put)
> +		pm_runtime_put_sync(dev);
> +
>  	return error;
>  }
>  
> @@ -843,16 +851,22 @@ static int __device_suspend(struct devic
>  	int error = 0;
>  
>  	dpm_wait_for_children(dev, async);
> -	device_lock(dev);
>  
>  	if (async_error)
> -		goto Unlock;
> +		return 0;
> +
> +	pm_runtime_get_noresume(dev);
> +	if (pm_runtime_barrier(dev) && device_may_wakeup(dev))
> +		pm_wakeup_event(dev, 0);
>  
>  	if (pm_wakeup_pending()) {
> +		pm_runtime_put_sync(dev);
>  		async_error = -EBUSY;
> -		goto Unlock;
> +		return 0;
>  	}
>  
> +	device_lock(dev);
> +
>  	if (dev->pwr_domain) {
>  		pm_dev_dbg(dev, state, "power domain ");
>  		error = pm_op(dev, &dev->pwr_domain->ops, state);
> @@ -890,12 +904,15 @@ static int __device_suspend(struct devic
>   End:
>  	dev->power.is_suspended = !error;
>  
> - Unlock:
>  	device_unlock(dev);
>  	complete_all(&dev->power.completion);
>  
> -	if (error)
> +	if (error) {
> +		pm_runtime_put_sync(dev);
>  		async_error = error;
> +	} else if (dev->power.is_suspended) {
> +		__pm_runtime_disable(dev, false);
> +	}
>  
>  	return error;
>  }

Looks right.


> Index: linux-2.6/drivers/base/power/runtime.c
> ===================================================================
> --- linux-2.6.orig/drivers/base/power/runtime.c
> +++ linux-2.6/drivers/base/power/runtime.c
> @@ -455,12 +455,14 @@ static int rpm_resume(struct device *dev
>  	dev_dbg(dev, "%s flags 0x%x\n", __func__, rpmflags);
>  
>   repeat:
> -	if (dev->power.runtime_error)
> +	if (dev->power.runtime_error) {
>  		retval = -EINVAL;
> -	else if (dev->power.disable_depth > 0)
> -		retval = -EAGAIN;
> -	if (retval)
>  		goto out;
> +	} else if (dev->power.disable_depth > 0) {
> +		if (!(rpmflags & RPM_GET_PUT))
> +			retval = -EAGAIN;

Do you also want to check the current status?  If it isn't RPM_ACTIVE 
then perhaps you should return an error.

> +		goto out;
> +	}
>  
>  	/*
>  	 * Other scheduled or pending requests need to be canceled.  Small

Alan Stern


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-23 17:46               ` Rafael J. Wysocki
  2011-06-23 18:35                 ` Alan Stern
@ 2011-06-23 18:35                 ` Alan Stern
  1 sibling, 0 replies; 36+ messages in thread
From: Alan Stern @ 2011-06-23 18:35 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-pci, Linux PM mailing list, LKML, Jesse Barnes

On Thu, 23 Jun 2011, Rafael J. Wysocki wrote:

> OK, so what about the appended patch (the last hunk is necessary to make the
> SCSI error handling work when runtime PM is disabled, it should be a separate
> patch)?
> 
> Rafael
> 
> ---
>  drivers/base/power/main.c    |   27 ++++++++++++++++++++++-----
>  drivers/base/power/runtime.c |   10 ++++++----
>  2 files changed, 28 insertions(+), 9 deletions(-)
> 
> Index: linux-2.6/drivers/base/power/main.c
> ===================================================================
> --- linux-2.6.orig/drivers/base/power/main.c
> +++ linux-2.6/drivers/base/power/main.c
> @@ -505,6 +505,7 @@ static int legacy_resume(struct device *
>  static int device_resume(struct device *dev, pm_message_t state, bool async)
>  {
>  	int error = 0;
> +	bool put = false;
>  
>  	TRACE_DEVICE(dev);
>  	TRACE_RESUME(0);
> @@ -521,6 +522,9 @@ static int device_resume(struct device *
>  	if (!dev->power.is_suspended)
>  		goto Unlock;
>  
> +	pm_runtime_enable(dev);
> +	put = true;
> +
>  	if (dev->pwr_domain) {
>  		pm_dev_dbg(dev, state, "power domain ");
>  		error = pm_op(dev, &dev->pwr_domain->ops, state);
> @@ -563,6 +567,10 @@ static int device_resume(struct device *
>  	complete_all(&dev->power.completion);
>  
>  	TRACE_RESUME(error);
> +
> +	if (put)
> +		pm_runtime_put_sync(dev);
> +
>  	return error;
>  }
>  
> @@ -843,16 +851,22 @@ static int __device_suspend(struct devic
>  	int error = 0;
>  
>  	dpm_wait_for_children(dev, async);
> -	device_lock(dev);
>  
>  	if (async_error)
> -		goto Unlock;
> +		return 0;
> +
> +	pm_runtime_get_noresume(dev);
> +	if (pm_runtime_barrier(dev) && device_may_wakeup(dev))
> +		pm_wakeup_event(dev, 0);
>  
>  	if (pm_wakeup_pending()) {
> +		pm_runtime_put_sync(dev);
>  		async_error = -EBUSY;
> -		goto Unlock;
> +		return 0;
>  	}
>  
> +	device_lock(dev);
> +
>  	if (dev->pwr_domain) {
>  		pm_dev_dbg(dev, state, "power domain ");
>  		error = pm_op(dev, &dev->pwr_domain->ops, state);
> @@ -890,12 +904,15 @@ static int __device_suspend(struct devic
>   End:
>  	dev->power.is_suspended = !error;
>  
> - Unlock:
>  	device_unlock(dev);
>  	complete_all(&dev->power.completion);
>  
> -	if (error)
> +	if (error) {
> +		pm_runtime_put_sync(dev);
>  		async_error = error;
> +	} else if (dev->power.is_suspended) {
> +		__pm_runtime_disable(dev, false);
> +	}
>  
>  	return error;
>  }

Looks right.


> Index: linux-2.6/drivers/base/power/runtime.c
> ===================================================================
> --- linux-2.6.orig/drivers/base/power/runtime.c
> +++ linux-2.6/drivers/base/power/runtime.c
> @@ -455,12 +455,14 @@ static int rpm_resume(struct device *dev
>  	dev_dbg(dev, "%s flags 0x%x\n", __func__, rpmflags);
>  
>   repeat:
> -	if (dev->power.runtime_error)
> +	if (dev->power.runtime_error) {
>  		retval = -EINVAL;
> -	else if (dev->power.disable_depth > 0)
> -		retval = -EAGAIN;
> -	if (retval)
>  		goto out;
> +	} else if (dev->power.disable_depth > 0) {
> +		if (!(rpmflags & RPM_GET_PUT))
> +			retval = -EAGAIN;

Do you also want to check the current status?  If it isn't RPM_ACTIVE 
then perhaps you should return an error.

> +		goto out;
> +	}
>  
>  	/*
>  	 * Other scheduled or pending requests need to be canceled.  Small

Alan Stern

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-23 18:35                 ` Alan Stern
  2011-06-23 20:49                   ` Rafael J. Wysocki
@ 2011-06-23 20:49                   ` Rafael J. Wysocki
  2011-06-23 21:02                     ` Alan Stern
  2011-06-23 21:02                     ` Alan Stern
  1 sibling, 2 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-06-23 20:49 UTC (permalink / raw)
  To: Alan Stern; +Cc: Linux PM mailing list, LKML, Jesse Barnes, linux-pci

On Thursday, June 23, 2011, Alan Stern wrote:
> On Thu, 23 Jun 2011, Rafael J. Wysocki wrote:
> 
> > OK, so what about the appended patch (the last hunk is necessary to make the
> > SCSI error handling work when runtime PM is disabled, it should be a separate
> > patch)?
> > 
> > Rafael
> > 
> > ---
> >  drivers/base/power/main.c    |   27 ++++++++++++++++++++++-----
> >  drivers/base/power/runtime.c |   10 ++++++----
> >  2 files changed, 28 insertions(+), 9 deletions(-)
> > 
> > Index: linux-2.6/drivers/base/power/main.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/base/power/main.c
> > +++ linux-2.6/drivers/base/power/main.c
> > @@ -505,6 +505,7 @@ static int legacy_resume(struct device *
> >  static int device_resume(struct device *dev, pm_message_t state, bool async)
> >  {
> >  	int error = 0;
> > +	bool put = false;
> >  
> >  	TRACE_DEVICE(dev);
> >  	TRACE_RESUME(0);
> > @@ -521,6 +522,9 @@ static int device_resume(struct device *
> >  	if (!dev->power.is_suspended)
> >  		goto Unlock;
> >  
> > +	pm_runtime_enable(dev);
> > +	put = true;
> > +
> >  	if (dev->pwr_domain) {
> >  		pm_dev_dbg(dev, state, "power domain ");
> >  		error = pm_op(dev, &dev->pwr_domain->ops, state);
> > @@ -563,6 +567,10 @@ static int device_resume(struct device *
> >  	complete_all(&dev->power.completion);
> >  
> >  	TRACE_RESUME(error);
> > +
> > +	if (put)
> > +		pm_runtime_put_sync(dev);
> > +
> >  	return error;
> >  }
> >  
> > @@ -843,16 +851,22 @@ static int __device_suspend(struct devic
> >  	int error = 0;
> >  
> >  	dpm_wait_for_children(dev, async);
> > -	device_lock(dev);
> >  
> >  	if (async_error)
> > -		goto Unlock;
> > +		return 0;
> > +
> > +	pm_runtime_get_noresume(dev);
> > +	if (pm_runtime_barrier(dev) && device_may_wakeup(dev))
> > +		pm_wakeup_event(dev, 0);
> >  
> >  	if (pm_wakeup_pending()) {
> > +		pm_runtime_put_sync(dev);
> >  		async_error = -EBUSY;
> > -		goto Unlock;
> > +		return 0;
> >  	}
> >  
> > +	device_lock(dev);
> > +
> >  	if (dev->pwr_domain) {
> >  		pm_dev_dbg(dev, state, "power domain ");
> >  		error = pm_op(dev, &dev->pwr_domain->ops, state);
> > @@ -890,12 +904,15 @@ static int __device_suspend(struct devic
> >   End:
> >  	dev->power.is_suspended = !error;
> >  
> > - Unlock:
> >  	device_unlock(dev);
> >  	complete_all(&dev->power.completion);
> >  
> > -	if (error)
> > +	if (error) {
> > +		pm_runtime_put_sync(dev);
> >  		async_error = error;
> > +	} else if (dev->power.is_suspended) {
> > +		__pm_runtime_disable(dev, false);
> > +	}
> >  
> >  	return error;
> >  }
> 
> Looks right.

Great!  I'll prepare a final version with a changelog and documentation
update, then.

> > Index: linux-2.6/drivers/base/power/runtime.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/base/power/runtime.c
> > +++ linux-2.6/drivers/base/power/runtime.c
> > @@ -455,12 +455,14 @@ static int rpm_resume(struct device *dev
> >  	dev_dbg(dev, "%s flags 0x%x\n", __func__, rpmflags);
> >  
> >   repeat:
> > -	if (dev->power.runtime_error)
> > +	if (dev->power.runtime_error) {
> >  		retval = -EINVAL;
> > -	else if (dev->power.disable_depth > 0)
> > -		retval = -EAGAIN;
> > -	if (retval)
> >  		goto out;
> > +	} else if (dev->power.disable_depth > 0) {
> > +		if (!(rpmflags & RPM_GET_PUT))
> > +			retval = -EAGAIN;
> 
> Do you also want to check the current status?  If it isn't RPM_ACTIVE 
> then perhaps you should return an error.

That depends on whether or not we want RPM_ACTIVE to have any meaning for
devices whose power.disable_depth is nonzero.  My opinion is that if
power.disable_depth is nonzero, the runtime PM status of the device
shouldn't matter (it may be changed on the fly in ways that need not
reflect the real status).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-23 18:35                 ` Alan Stern
@ 2011-06-23 20:49                   ` Rafael J. Wysocki
  2011-06-23 20:49                   ` Rafael J. Wysocki
  1 sibling, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-06-23 20:49 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-pci, Linux PM mailing list, LKML, Jesse Barnes

On Thursday, June 23, 2011, Alan Stern wrote:
> On Thu, 23 Jun 2011, Rafael J. Wysocki wrote:
> 
> > OK, so what about the appended patch (the last hunk is necessary to make the
> > SCSI error handling work when runtime PM is disabled, it should be a separate
> > patch)?
> > 
> > Rafael
> > 
> > ---
> >  drivers/base/power/main.c    |   27 ++++++++++++++++++++++-----
> >  drivers/base/power/runtime.c |   10 ++++++----
> >  2 files changed, 28 insertions(+), 9 deletions(-)
> > 
> > Index: linux-2.6/drivers/base/power/main.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/base/power/main.c
> > +++ linux-2.6/drivers/base/power/main.c
> > @@ -505,6 +505,7 @@ static int legacy_resume(struct device *
> >  static int device_resume(struct device *dev, pm_message_t state, bool async)
> >  {
> >  	int error = 0;
> > +	bool put = false;
> >  
> >  	TRACE_DEVICE(dev);
> >  	TRACE_RESUME(0);
> > @@ -521,6 +522,9 @@ static int device_resume(struct device *
> >  	if (!dev->power.is_suspended)
> >  		goto Unlock;
> >  
> > +	pm_runtime_enable(dev);
> > +	put = true;
> > +
> >  	if (dev->pwr_domain) {
> >  		pm_dev_dbg(dev, state, "power domain ");
> >  		error = pm_op(dev, &dev->pwr_domain->ops, state);
> > @@ -563,6 +567,10 @@ static int device_resume(struct device *
> >  	complete_all(&dev->power.completion);
> >  
> >  	TRACE_RESUME(error);
> > +
> > +	if (put)
> > +		pm_runtime_put_sync(dev);
> > +
> >  	return error;
> >  }
> >  
> > @@ -843,16 +851,22 @@ static int __device_suspend(struct devic
> >  	int error = 0;
> >  
> >  	dpm_wait_for_children(dev, async);
> > -	device_lock(dev);
> >  
> >  	if (async_error)
> > -		goto Unlock;
> > +		return 0;
> > +
> > +	pm_runtime_get_noresume(dev);
> > +	if (pm_runtime_barrier(dev) && device_may_wakeup(dev))
> > +		pm_wakeup_event(dev, 0);
> >  
> >  	if (pm_wakeup_pending()) {
> > +		pm_runtime_put_sync(dev);
> >  		async_error = -EBUSY;
> > -		goto Unlock;
> > +		return 0;
> >  	}
> >  
> > +	device_lock(dev);
> > +
> >  	if (dev->pwr_domain) {
> >  		pm_dev_dbg(dev, state, "power domain ");
> >  		error = pm_op(dev, &dev->pwr_domain->ops, state);
> > @@ -890,12 +904,15 @@ static int __device_suspend(struct devic
> >   End:
> >  	dev->power.is_suspended = !error;
> >  
> > - Unlock:
> >  	device_unlock(dev);
> >  	complete_all(&dev->power.completion);
> >  
> > -	if (error)
> > +	if (error) {
> > +		pm_runtime_put_sync(dev);
> >  		async_error = error;
> > +	} else if (dev->power.is_suspended) {
> > +		__pm_runtime_disable(dev, false);
> > +	}
> >  
> >  	return error;
> >  }
> 
> Looks right.

Great!  I'll prepare a final version with a changelog and documentation
update, then.

> > Index: linux-2.6/drivers/base/power/runtime.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/base/power/runtime.c
> > +++ linux-2.6/drivers/base/power/runtime.c
> > @@ -455,12 +455,14 @@ static int rpm_resume(struct device *dev
> >  	dev_dbg(dev, "%s flags 0x%x\n", __func__, rpmflags);
> >  
> >   repeat:
> > -	if (dev->power.runtime_error)
> > +	if (dev->power.runtime_error) {
> >  		retval = -EINVAL;
> > -	else if (dev->power.disable_depth > 0)
> > -		retval = -EAGAIN;
> > -	if (retval)
> >  		goto out;
> > +	} else if (dev->power.disable_depth > 0) {
> > +		if (!(rpmflags & RPM_GET_PUT))
> > +			retval = -EAGAIN;
> 
> Do you also want to check the current status?  If it isn't RPM_ACTIVE 
> then perhaps you should return an error.

That depends on whether or not we want RPM_ACTIVE to have any meaning for
devices whose power.disable_depth is nonzero.  My opinion is that if
power.disable_depth is nonzero, the runtime PM status of the device
shouldn't matter (it may be changed on the fly in ways that need not
reflect the real status).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-23 20:49                   ` Rafael J. Wysocki
  2011-06-23 21:02                     ` Alan Stern
@ 2011-06-23 21:02                     ` Alan Stern
  2011-06-23 21:16                       ` Rafael J. Wysocki
  2011-06-23 21:16                       ` Rafael J. Wysocki
  1 sibling, 2 replies; 36+ messages in thread
From: Alan Stern @ 2011-06-23 21:02 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PM mailing list, LKML, Jesse Barnes, linux-pci

On Thu, 23 Jun 2011, Rafael J. Wysocki wrote:

> > > Index: linux-2.6/drivers/base/power/runtime.c
> > > ===================================================================
> > > --- linux-2.6.orig/drivers/base/power/runtime.c
> > > +++ linux-2.6/drivers/base/power/runtime.c
> > > @@ -455,12 +455,14 @@ static int rpm_resume(struct device *dev
> > >  	dev_dbg(dev, "%s flags 0x%x\n", __func__, rpmflags);
> > >  
> > >   repeat:
> > > -	if (dev->power.runtime_error)
> > > +	if (dev->power.runtime_error) {
> > >  		retval = -EINVAL;
> > > -	else if (dev->power.disable_depth > 0)
> > > -		retval = -EAGAIN;
> > > -	if (retval)
> > >  		goto out;
> > > +	} else if (dev->power.disable_depth > 0) {
> > > +		if (!(rpmflags & RPM_GET_PUT))
> > > +			retval = -EAGAIN;
> > 
> > Do you also want to check the current status?  If it isn't RPM_ACTIVE 
> > then perhaps you should return an error.
> 
> That depends on whether or not we want RPM_ACTIVE to have any meaning for
> devices whose power.disable_depth is nonzero.  My opinion is that if
> power.disable_depth is nonzero, the runtime PM status of the device
> shouldn't matter (it may be changed on the fly in ways that need not
> reflect the real status).

Then maybe this disable_depth > 0 case should return something other
than 0.  Something new, like -EACCES.  That way the caller would
realize something strange was going on but wouldn't have to treat the 
situation as an error.

After all, the return value from pm_runtime_get_sync() is documented to 
be the error code for the underlying pm_runtime_resume().  It doesn't 
refer to the increment operation -- that always succeeds.

Alan Stern


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-23 20:49                   ` Rafael J. Wysocki
@ 2011-06-23 21:02                     ` Alan Stern
  2011-06-23 21:02                     ` Alan Stern
  1 sibling, 0 replies; 36+ messages in thread
From: Alan Stern @ 2011-06-23 21:02 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-pci, Linux PM mailing list, LKML, Jesse Barnes

On Thu, 23 Jun 2011, Rafael J. Wysocki wrote:

> > > Index: linux-2.6/drivers/base/power/runtime.c
> > > ===================================================================
> > > --- linux-2.6.orig/drivers/base/power/runtime.c
> > > +++ linux-2.6/drivers/base/power/runtime.c
> > > @@ -455,12 +455,14 @@ static int rpm_resume(struct device *dev
> > >  	dev_dbg(dev, "%s flags 0x%x\n", __func__, rpmflags);
> > >  
> > >   repeat:
> > > -	if (dev->power.runtime_error)
> > > +	if (dev->power.runtime_error) {
> > >  		retval = -EINVAL;
> > > -	else if (dev->power.disable_depth > 0)
> > > -		retval = -EAGAIN;
> > > -	if (retval)
> > >  		goto out;
> > > +	} else if (dev->power.disable_depth > 0) {
> > > +		if (!(rpmflags & RPM_GET_PUT))
> > > +			retval = -EAGAIN;
> > 
> > Do you also want to check the current status?  If it isn't RPM_ACTIVE 
> > then perhaps you should return an error.
> 
> That depends on whether or not we want RPM_ACTIVE to have any meaning for
> devices whose power.disable_depth is nonzero.  My opinion is that if
> power.disable_depth is nonzero, the runtime PM status of the device
> shouldn't matter (it may be changed on the fly in ways that need not
> reflect the real status).

Then maybe this disable_depth > 0 case should return something other
than 0.  Something new, like -EACCES.  That way the caller would
realize something strange was going on but wouldn't have to treat the 
situation as an error.

After all, the return value from pm_runtime_get_sync() is documented to 
be the error code for the underlying pm_runtime_resume().  It doesn't 
refer to the increment operation -- that always succeeds.

Alan Stern

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-23 21:02                     ` Alan Stern
  2011-06-23 21:16                       ` Rafael J. Wysocki
@ 2011-06-23 21:16                       ` Rafael J. Wysocki
  2011-06-23 21:38                         ` Alan Stern
  2011-06-23 21:38                         ` Alan Stern
  1 sibling, 2 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-06-23 21:16 UTC (permalink / raw)
  To: Alan Stern; +Cc: Linux PM mailing list, LKML, Jesse Barnes, linux-pci

On Thursday, June 23, 2011, Alan Stern wrote:
> On Thu, 23 Jun 2011, Rafael J. Wysocki wrote:
> 
> > > > Index: linux-2.6/drivers/base/power/runtime.c
> > > > ===================================================================
> > > > --- linux-2.6.orig/drivers/base/power/runtime.c
> > > > +++ linux-2.6/drivers/base/power/runtime.c
> > > > @@ -455,12 +455,14 @@ static int rpm_resume(struct device *dev
> > > >  	dev_dbg(dev, "%s flags 0x%x\n", __func__, rpmflags);
> > > >  
> > > >   repeat:
> > > > -	if (dev->power.runtime_error)
> > > > +	if (dev->power.runtime_error) {
> > > >  		retval = -EINVAL;
> > > > -	else if (dev->power.disable_depth > 0)
> > > > -		retval = -EAGAIN;
> > > > -	if (retval)
> > > >  		goto out;
> > > > +	} else if (dev->power.disable_depth > 0) {
> > > > +		if (!(rpmflags & RPM_GET_PUT))
> > > > +			retval = -EAGAIN;
> > > 
> > > Do you also want to check the current status?  If it isn't RPM_ACTIVE 
> > > then perhaps you should return an error.
> > 
> > That depends on whether or not we want RPM_ACTIVE to have any meaning for
> > devices whose power.disable_depth is nonzero.  My opinion is that if
> > power.disable_depth is nonzero, the runtime PM status of the device
> > shouldn't matter (it may be changed on the fly in ways that need not
> > reflect the real status).
> 
> Then maybe this disable_depth > 0 case should return something other
> than 0.  Something new, like -EACCES.  That way the caller would
> realize something strange was going on but wouldn't have to treat the 
> situation as an error.

I would be fine with that, but then we'd need to reserve that error code,
so that it's not returned by subsystem callbacks (or even we should convert
it to a different error code if it is returned by the subsystem callback in
rpm_resume()). 

> After all, the return value from pm_runtime_get_sync() is documented to 
> be the error code for the underlying pm_runtime_resume().  It doesn't 
> refer to the increment operation -- that always succeeds.

That means we should change the caller, which is the SCSI subsystem in this
particular case, to check the error code.  The problem with this approach
is that the same error code may be returned in a different situation, so
we should prevent that from happening in the first place.  Still, suppose
that we do that and that the caller checks the error code.  What is it
supposed to do in that situation?  The only reasonable action for the
caller is to ignore the error code if it means disable_depth > 0 and go
on with whatever it has to do, but that's what it will do if the
pm_runtime_get_sync() returns 0 in that situation.

So, in my opinion it simply may be best to update the documentation of
pm_runtime_get_sync() along with the code changes. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-23 21:02                     ` Alan Stern
@ 2011-06-23 21:16                       ` Rafael J. Wysocki
  2011-06-23 21:16                       ` Rafael J. Wysocki
  1 sibling, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-06-23 21:16 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-pci, Linux PM mailing list, LKML, Jesse Barnes

On Thursday, June 23, 2011, Alan Stern wrote:
> On Thu, 23 Jun 2011, Rafael J. Wysocki wrote:
> 
> > > > Index: linux-2.6/drivers/base/power/runtime.c
> > > > ===================================================================
> > > > --- linux-2.6.orig/drivers/base/power/runtime.c
> > > > +++ linux-2.6/drivers/base/power/runtime.c
> > > > @@ -455,12 +455,14 @@ static int rpm_resume(struct device *dev
> > > >  	dev_dbg(dev, "%s flags 0x%x\n", __func__, rpmflags);
> > > >  
> > > >   repeat:
> > > > -	if (dev->power.runtime_error)
> > > > +	if (dev->power.runtime_error) {
> > > >  		retval = -EINVAL;
> > > > -	else if (dev->power.disable_depth > 0)
> > > > -		retval = -EAGAIN;
> > > > -	if (retval)
> > > >  		goto out;
> > > > +	} else if (dev->power.disable_depth > 0) {
> > > > +		if (!(rpmflags & RPM_GET_PUT))
> > > > +			retval = -EAGAIN;
> > > 
> > > Do you also want to check the current status?  If it isn't RPM_ACTIVE 
> > > then perhaps you should return an error.
> > 
> > That depends on whether or not we want RPM_ACTIVE to have any meaning for
> > devices whose power.disable_depth is nonzero.  My opinion is that if
> > power.disable_depth is nonzero, the runtime PM status of the device
> > shouldn't matter (it may be changed on the fly in ways that need not
> > reflect the real status).
> 
> Then maybe this disable_depth > 0 case should return something other
> than 0.  Something new, like -EACCES.  That way the caller would
> realize something strange was going on but wouldn't have to treat the 
> situation as an error.

I would be fine with that, but then we'd need to reserve that error code,
so that it's not returned by subsystem callbacks (or even we should convert
it to a different error code if it is returned by the subsystem callback in
rpm_resume()). 

> After all, the return value from pm_runtime_get_sync() is documented to 
> be the error code for the underlying pm_runtime_resume().  It doesn't 
> refer to the increment operation -- that always succeeds.

That means we should change the caller, which is the SCSI subsystem in this
particular case, to check the error code.  The problem with this approach
is that the same error code may be returned in a different situation, so
we should prevent that from happening in the first place.  Still, suppose
that we do that and that the caller checks the error code.  What is it
supposed to do in that situation?  The only reasonable action for the
caller is to ignore the error code if it means disable_depth > 0 and go
on with whatever it has to do, but that's what it will do if the
pm_runtime_get_sync() returns 0 in that situation.

So, in my opinion it simply may be best to update the documentation of
pm_runtime_get_sync() along with the code changes. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-23 21:16                       ` Rafael J. Wysocki
@ 2011-06-23 21:38                         ` Alan Stern
  2011-06-23 22:35                           ` Rafael J. Wysocki
  2011-06-23 22:35                           ` Rafael J. Wysocki
  2011-06-23 21:38                         ` Alan Stern
  1 sibling, 2 replies; 36+ messages in thread
From: Alan Stern @ 2011-06-23 21:38 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PM mailing list, LKML, Jesse Barnes, linux-pci

On Thu, 23 Jun 2011, Rafael J. Wysocki wrote:

> > Then maybe this disable_depth > 0 case should return something other
> > than 0.  Something new, like -EACCES.  That way the caller would
> > realize something strange was going on but wouldn't have to treat the 
> > situation as an error.
> 
> I would be fine with that, but then we'd need to reserve that error code,
> so that it's not returned by subsystem callbacks (or even we should convert
> it to a different error code if it is returned by the subsystem callback in
> rpm_resume()). 
> 
> > After all, the return value from pm_runtime_get_sync() is documented to 
> > be the error code for the underlying pm_runtime_resume().  It doesn't 
> > refer to the increment operation -- that always succeeds.
> 
> That means we should change the caller, which is the SCSI subsystem in this
> particular case, to check the error code.  The problem with this approach
> is that the same error code may be returned in a different situation, so
> we should prevent that from happening in the first place.  Still, suppose
> that we do that and that the caller checks the error code.  What is it
> supposed to do in that situation?  The only reasonable action for the
> caller is to ignore the error code if it means disable_depth > 0 and go
> on with whatever it has to do, but that's what it will do if the
> pm_runtime_get_sync() returns 0 in that situation.
> 
> So, in my opinion it simply may be best to update the documentation of
> pm_runtime_get_sync() along with the code changes. :-)

The only reason you're doing this is for the SCSI error-handler 
routine?

I think it would be easier to change that routine instead of the PM 
core.  It should be smart enough to know that a runtime PM call isn't 
needed during a system sleep transition, i.e., between the scsi_host's 
suspend and resume callbacks.  Maybe check the new is_suspended flag.  
You'd also have to make sure the scsi_host wasn't runtime suspended 
when the sleep begins, rather like PCI.

I'm still not clear on why the error handler needs to run at this time.

Alan Stern


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-23 21:16                       ` Rafael J. Wysocki
  2011-06-23 21:38                         ` Alan Stern
@ 2011-06-23 21:38                         ` Alan Stern
  1 sibling, 0 replies; 36+ messages in thread
From: Alan Stern @ 2011-06-23 21:38 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-pci, Linux PM mailing list, LKML, Jesse Barnes

On Thu, 23 Jun 2011, Rafael J. Wysocki wrote:

> > Then maybe this disable_depth > 0 case should return something other
> > than 0.  Something new, like -EACCES.  That way the caller would
> > realize something strange was going on but wouldn't have to treat the 
> > situation as an error.
> 
> I would be fine with that, but then we'd need to reserve that error code,
> so that it's not returned by subsystem callbacks (or even we should convert
> it to a different error code if it is returned by the subsystem callback in
> rpm_resume()). 
> 
> > After all, the return value from pm_runtime_get_sync() is documented to 
> > be the error code for the underlying pm_runtime_resume().  It doesn't 
> > refer to the increment operation -- that always succeeds.
> 
> That means we should change the caller, which is the SCSI subsystem in this
> particular case, to check the error code.  The problem with this approach
> is that the same error code may be returned in a different situation, so
> we should prevent that from happening in the first place.  Still, suppose
> that we do that and that the caller checks the error code.  What is it
> supposed to do in that situation?  The only reasonable action for the
> caller is to ignore the error code if it means disable_depth > 0 and go
> on with whatever it has to do, but that's what it will do if the
> pm_runtime_get_sync() returns 0 in that situation.
> 
> So, in my opinion it simply may be best to update the documentation of
> pm_runtime_get_sync() along with the code changes. :-)

The only reason you're doing this is for the SCSI error-handler 
routine?

I think it would be easier to change that routine instead of the PM 
core.  It should be smart enough to know that a runtime PM call isn't 
needed during a system sleep transition, i.e., between the scsi_host's 
suspend and resume callbacks.  Maybe check the new is_suspended flag.  
You'd also have to make sure the scsi_host wasn't runtime suspended 
when the sleep begins, rather like PCI.

I'm still not clear on why the error handler needs to run at this time.

Alan Stern

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-23 21:38                         ` Alan Stern
  2011-06-23 22:35                           ` Rafael J. Wysocki
@ 2011-06-23 22:35                           ` Rafael J. Wysocki
  2011-06-23 22:59                             ` Rafael J. Wysocki
  2011-06-23 22:59                             ` Rafael J. Wysocki
  1 sibling, 2 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-06-23 22:35 UTC (permalink / raw)
  To: Alan Stern
  Cc: Linux PM mailing list, LKML, Jesse Barnes, linux-pci, Tejun Heo

On Thursday, June 23, 2011, Alan Stern wrote:
> On Thu, 23 Jun 2011, Rafael J. Wysocki wrote:
> 
> > > Then maybe this disable_depth > 0 case should return something other
> > > than 0.  Something new, like -EACCES.  That way the caller would
> > > realize something strange was going on but wouldn't have to treat the 
> > > situation as an error.
> > 
> > I would be fine with that, but then we'd need to reserve that error code,
> > so that it's not returned by subsystem callbacks (or even we should convert
> > it to a different error code if it is returned by the subsystem callback in
> > rpm_resume()). 
> > 
> > > After all, the return value from pm_runtime_get_sync() is documented to 
> > > be the error code for the underlying pm_runtime_resume().  It doesn't 
> > > refer to the increment operation -- that always succeeds.
> > 
> > That means we should change the caller, which is the SCSI subsystem in this
> > particular case, to check the error code.  The problem with this approach
> > is that the same error code may be returned in a different situation, so
> > we should prevent that from happening in the first place.  Still, suppose
> > that we do that and that the caller checks the error code.  What is it
> > supposed to do in that situation?  The only reasonable action for the
> > caller is to ignore the error code if it means disable_depth > 0 and go
> > on with whatever it has to do, but that's what it will do if the
> > pm_runtime_get_sync() returns 0 in that situation.
> > 
> > So, in my opinion it simply may be best to update the documentation of
> > pm_runtime_get_sync() along with the code changes. :-)
> 
> The only reason you're doing this is for the SCSI error-handler 
> routine?

Yes, it is.

> I think it would be easier to change that routine instead of the PM 
> core.  It should be smart enough to know that a runtime PM call isn't 
> needed during a system sleep transition, i.e., between the scsi_host's 
> suspend and resume callbacks.  Maybe check the new is_suspended flag.  
> You'd also have to make sure the scsi_host wasn't runtime suspended 
> when the sleep begins, rather like PCI.

Well, I think the problem is still there even at run time if runtime PM
happens to be disabled for shost_gendev (this probably never happens in
practice, though, except when runtime PM is disabled during system
suspend, which also wasn't done before).

> I'm still not clear on why the error handler needs to run at this time.

Because SATA ports are suspended with the help of the SCSI error handling
mechanism (which Tejun claims is the best way to do that).

But the thing done by scsi_autopm_get_host() is entirely reasonable
(maybe except that calling pm_runtime_put_sync() after failing
pm_runtime_get_sync() is rather pointless, pm_runtime_put_noidle() would
be sufficient), but it doesn't work for disabled runtime PM.

So, the question is whether or not what scsi_autopm_get_host() does should work
when runtime PM is disabled and I think it should.  Hence the patch.

Now, I agree that the proposed change is a little ugly, because it makes
"resume with taking reference" behave differently from "resume".  The
solution to that would be to return a special error code in the "disabled
runtime PM" case.  However, in that case we'd need to change the runtime PM
code in general to return this particular error code in the "disabled runtime
PM" case and to prevent this error code from being returned in other
circumstances.  In addition to that, we'de need to change the SCSI code, of
course.

Do you think that would be better?

Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-23 21:38                         ` Alan Stern
@ 2011-06-23 22:35                           ` Rafael J. Wysocki
  2011-06-23 22:35                           ` Rafael J. Wysocki
  1 sibling, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-06-23 22:35 UTC (permalink / raw)
  To: Alan Stern
  Cc: Tejun Heo, linux-pci, Linux PM mailing list, LKML, Jesse Barnes

On Thursday, June 23, 2011, Alan Stern wrote:
> On Thu, 23 Jun 2011, Rafael J. Wysocki wrote:
> 
> > > Then maybe this disable_depth > 0 case should return something other
> > > than 0.  Something new, like -EACCES.  That way the caller would
> > > realize something strange was going on but wouldn't have to treat the 
> > > situation as an error.
> > 
> > I would be fine with that, but then we'd need to reserve that error code,
> > so that it's not returned by subsystem callbacks (or even we should convert
> > it to a different error code if it is returned by the subsystem callback in
> > rpm_resume()). 
> > 
> > > After all, the return value from pm_runtime_get_sync() is documented to 
> > > be the error code for the underlying pm_runtime_resume().  It doesn't 
> > > refer to the increment operation -- that always succeeds.
> > 
> > That means we should change the caller, which is the SCSI subsystem in this
> > particular case, to check the error code.  The problem with this approach
> > is that the same error code may be returned in a different situation, so
> > we should prevent that from happening in the first place.  Still, suppose
> > that we do that and that the caller checks the error code.  What is it
> > supposed to do in that situation?  The only reasonable action for the
> > caller is to ignore the error code if it means disable_depth > 0 and go
> > on with whatever it has to do, but that's what it will do if the
> > pm_runtime_get_sync() returns 0 in that situation.
> > 
> > So, in my opinion it simply may be best to update the documentation of
> > pm_runtime_get_sync() along with the code changes. :-)
> 
> The only reason you're doing this is for the SCSI error-handler 
> routine?

Yes, it is.

> I think it would be easier to change that routine instead of the PM 
> core.  It should be smart enough to know that a runtime PM call isn't 
> needed during a system sleep transition, i.e., between the scsi_host's 
> suspend and resume callbacks.  Maybe check the new is_suspended flag.  
> You'd also have to make sure the scsi_host wasn't runtime suspended 
> when the sleep begins, rather like PCI.

Well, I think the problem is still there even at run time if runtime PM
happens to be disabled for shost_gendev (this probably never happens in
practice, though, except when runtime PM is disabled during system
suspend, which also wasn't done before).

> I'm still not clear on why the error handler needs to run at this time.

Because SATA ports are suspended with the help of the SCSI error handling
mechanism (which Tejun claims is the best way to do that).

But the thing done by scsi_autopm_get_host() is entirely reasonable
(maybe except that calling pm_runtime_put_sync() after failing
pm_runtime_get_sync() is rather pointless, pm_runtime_put_noidle() would
be sufficient), but it doesn't work for disabled runtime PM.

So, the question is whether or not what scsi_autopm_get_host() does should work
when runtime PM is disabled and I think it should.  Hence the patch.

Now, I agree that the proposed change is a little ugly, because it makes
"resume with taking reference" behave differently from "resume".  The
solution to that would be to return a special error code in the "disabled
runtime PM" case.  However, in that case we'd need to change the runtime PM
code in general to return this particular error code in the "disabled runtime
PM" case and to prevent this error code from being returned in other
circumstances.  In addition to that, we'de need to change the SCSI code, of
course.

Do you think that would be better?

Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-23 22:35                           ` Rafael J. Wysocki
  2011-06-23 22:59                             ` Rafael J. Wysocki
@ 2011-06-23 22:59                             ` Rafael J. Wysocki
  2011-06-26  2:39                               ` Alan Stern
  2011-06-26  2:39                               ` Alan Stern
  1 sibling, 2 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-06-23 22:59 UTC (permalink / raw)
  To: Alan Stern
  Cc: Linux PM mailing list, LKML, Jesse Barnes, linux-pci, Tejun Heo

On Friday, June 24, 2011, Rafael J. Wysocki wrote:
> On Thursday, June 23, 2011, Alan Stern wrote:
> > On Thu, 23 Jun 2011, Rafael J. Wysocki wrote:
> > 
> > > > Then maybe this disable_depth > 0 case should return something other
> > > > than 0.  Something new, like -EACCES.  That way the caller would
> > > > realize something strange was going on but wouldn't have to treat the 
> > > > situation as an error.
> > > 
> > > I would be fine with that, but then we'd need to reserve that error code,
> > > so that it's not returned by subsystem callbacks (or even we should convert
> > > it to a different error code if it is returned by the subsystem callback in
> > > rpm_resume()). 
> > > 
> > > > After all, the return value from pm_runtime_get_sync() is documented to 
> > > > be the error code for the underlying pm_runtime_resume().  It doesn't 
> > > > refer to the increment operation -- that always succeeds.
> > > 
> > > That means we should change the caller, which is the SCSI subsystem in this
> > > particular case, to check the error code.  The problem with this approach
> > > is that the same error code may be returned in a different situation, so
> > > we should prevent that from happening in the first place.  Still, suppose
> > > that we do that and that the caller checks the error code.  What is it
> > > supposed to do in that situation?  The only reasonable action for the
> > > caller is to ignore the error code if it means disable_depth > 0 and go
> > > on with whatever it has to do, but that's what it will do if the
> > > pm_runtime_get_sync() returns 0 in that situation.
> > > 
> > > So, in my opinion it simply may be best to update the documentation of
> > > pm_runtime_get_sync() along with the code changes. :-)
> > 
> > The only reason you're doing this is for the SCSI error-handler 
> > routine?
> 
> Yes, it is.
> 
> > I think it would be easier to change that routine instead of the PM 
> > core.  It should be smart enough to know that a runtime PM call isn't 
> > needed during a system sleep transition, i.e., between the scsi_host's 
> > suspend and resume callbacks.  Maybe check the new is_suspended flag.  
> > You'd also have to make sure the scsi_host wasn't runtime suspended 
> > when the sleep begins, rather like PCI.
> 
> Well, I think the problem is still there even at run time if runtime PM
> happens to be disabled for shost_gendev (this probably never happens in
> practice, though, except when runtime PM is disabled during system
> suspend, which also wasn't done before).
> 
> > I'm still not clear on why the error handler needs to run at this time.
> 
> Because SATA ports are suspended with the help of the SCSI error handling
> mechanism (which Tejun claims is the best way to do that).
> 
> But the thing done by scsi_autopm_get_host() is entirely reasonable
> (maybe except that calling pm_runtime_put_sync() after failing
> pm_runtime_get_sync() is rather pointless, pm_runtime_put_noidle() would
> be sufficient), but it doesn't work for disabled runtime PM.
> 
> So, the question is whether or not what scsi_autopm_get_host() does should work
> when runtime PM is disabled and I think it should.  Hence the patch.
> 
> Now, I agree that the proposed change is a little ugly, because it makes
> "resume with taking reference" behave differently from "resume".  The
> solution to that would be to return a special error code in the "disabled
> runtime PM" case.  However, in that case we'd need to change the runtime PM
> code in general to return this particular error code in the "disabled runtime
> PM" case and to prevent this error code from being returned in other
> circumstances.  In addition to that, we'de need to change the SCSI code, of
> course.
> 
> Do you think that would be better?

I've carried out this exercise to see how complicated it is going to be
and it doesn't really seem to be _that_ complicated.  The appended patch
illustrates this, but it hasn't been tested, so caveat emptor.

Thanks,
Rafael

---
 drivers/base/power/runtime.c |    9 +++++----
 drivers/scsi/scsi_pm.c       |    8 ++++----
 2 files changed, 9 insertions(+), 8 deletions(-)

Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- linux-2.6.orig/drivers/base/power/runtime.c
+++ linux-2.6/drivers/base/power/runtime.c
@@ -135,8 +135,9 @@ static int rpm_check_suspend_allowed(str
 
 	if (dev->power.runtime_error)
 		retval = -EINVAL;
-	else if (atomic_read(&dev->power.usage_count) > 0
-	    || dev->power.disable_depth > 0)
+	else if (dev->power.disable_depth > 0)
+		retval = -EACCES;
+	else if (atomic_read(&dev->power.usage_count) > 0)
 		retval = -EAGAIN;
 	else if (!pm_children_suspended(dev))
 		retval = -EBUSY;
@@ -262,7 +263,7 @@ static int rpm_callback(int (*cb)(struct
 		spin_lock_irq(&dev->power.lock);
 	}
 	dev->power.runtime_error = retval;
-	return retval;
+	return retval != -EACCES ? retval : -EIO;
 }
 
 /**
@@ -458,7 +459,7 @@ static int rpm_resume(struct device *dev
 	if (dev->power.runtime_error)
 		retval = -EINVAL;
 	else if (dev->power.disable_depth > 0)
-		retval = -EAGAIN;
+		retval = -EACCES;
 	if (retval)
 		goto out;
 
Index: linux-2.6/drivers/scsi/scsi_pm.c
===================================================================
--- linux-2.6.orig/drivers/scsi/scsi_pm.c
+++ linux-2.6/drivers/scsi/scsi_pm.c
@@ -144,9 +144,9 @@ int scsi_autopm_get_device(struct scsi_d
 	int	err;
 
 	err = pm_runtime_get_sync(&sdev->sdev_gendev);
-	if (err < 0)
+	if (err < 0 && err !=-EACCES)
 		pm_runtime_put_sync(&sdev->sdev_gendev);
-	else if (err > 0)
+	else
 		err = 0;
 	return err;
 }
@@ -173,9 +173,9 @@ int scsi_autopm_get_host(struct Scsi_Hos
 	int	err;
 
 	err = pm_runtime_get_sync(&shost->shost_gendev);
-	if (err < 0)
+	if (err < 0 && err !=-EACCES)
 		pm_runtime_put_sync(&shost->shost_gendev);
-	else if (err > 0)
+	else
 		err = 0;
 	return err;
 }

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-23 22:35                           ` Rafael J. Wysocki
@ 2011-06-23 22:59                             ` Rafael J. Wysocki
  2011-06-23 22:59                             ` Rafael J. Wysocki
  1 sibling, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-06-23 22:59 UTC (permalink / raw)
  To: Alan Stern
  Cc: Tejun Heo, linux-pci, Linux PM mailing list, LKML, Jesse Barnes

On Friday, June 24, 2011, Rafael J. Wysocki wrote:
> On Thursday, June 23, 2011, Alan Stern wrote:
> > On Thu, 23 Jun 2011, Rafael J. Wysocki wrote:
> > 
> > > > Then maybe this disable_depth > 0 case should return something other
> > > > than 0.  Something new, like -EACCES.  That way the caller would
> > > > realize something strange was going on but wouldn't have to treat the 
> > > > situation as an error.
> > > 
> > > I would be fine with that, but then we'd need to reserve that error code,
> > > so that it's not returned by subsystem callbacks (or even we should convert
> > > it to a different error code if it is returned by the subsystem callback in
> > > rpm_resume()). 
> > > 
> > > > After all, the return value from pm_runtime_get_sync() is documented to 
> > > > be the error code for the underlying pm_runtime_resume().  It doesn't 
> > > > refer to the increment operation -- that always succeeds.
> > > 
> > > That means we should change the caller, which is the SCSI subsystem in this
> > > particular case, to check the error code.  The problem with this approach
> > > is that the same error code may be returned in a different situation, so
> > > we should prevent that from happening in the first place.  Still, suppose
> > > that we do that and that the caller checks the error code.  What is it
> > > supposed to do in that situation?  The only reasonable action for the
> > > caller is to ignore the error code if it means disable_depth > 0 and go
> > > on with whatever it has to do, but that's what it will do if the
> > > pm_runtime_get_sync() returns 0 in that situation.
> > > 
> > > So, in my opinion it simply may be best to update the documentation of
> > > pm_runtime_get_sync() along with the code changes. :-)
> > 
> > The only reason you're doing this is for the SCSI error-handler 
> > routine?
> 
> Yes, it is.
> 
> > I think it would be easier to change that routine instead of the PM 
> > core.  It should be smart enough to know that a runtime PM call isn't 
> > needed during a system sleep transition, i.e., between the scsi_host's 
> > suspend and resume callbacks.  Maybe check the new is_suspended flag.  
> > You'd also have to make sure the scsi_host wasn't runtime suspended 
> > when the sleep begins, rather like PCI.
> 
> Well, I think the problem is still there even at run time if runtime PM
> happens to be disabled for shost_gendev (this probably never happens in
> practice, though, except when runtime PM is disabled during system
> suspend, which also wasn't done before).
> 
> > I'm still not clear on why the error handler needs to run at this time.
> 
> Because SATA ports are suspended with the help of the SCSI error handling
> mechanism (which Tejun claims is the best way to do that).
> 
> But the thing done by scsi_autopm_get_host() is entirely reasonable
> (maybe except that calling pm_runtime_put_sync() after failing
> pm_runtime_get_sync() is rather pointless, pm_runtime_put_noidle() would
> be sufficient), but it doesn't work for disabled runtime PM.
> 
> So, the question is whether or not what scsi_autopm_get_host() does should work
> when runtime PM is disabled and I think it should.  Hence the patch.
> 
> Now, I agree that the proposed change is a little ugly, because it makes
> "resume with taking reference" behave differently from "resume".  The
> solution to that would be to return a special error code in the "disabled
> runtime PM" case.  However, in that case we'd need to change the runtime PM
> code in general to return this particular error code in the "disabled runtime
> PM" case and to prevent this error code from being returned in other
> circumstances.  In addition to that, we'de need to change the SCSI code, of
> course.
> 
> Do you think that would be better?

I've carried out this exercise to see how complicated it is going to be
and it doesn't really seem to be _that_ complicated.  The appended patch
illustrates this, but it hasn't been tested, so caveat emptor.

Thanks,
Rafael

---
 drivers/base/power/runtime.c |    9 +++++----
 drivers/scsi/scsi_pm.c       |    8 ++++----
 2 files changed, 9 insertions(+), 8 deletions(-)

Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- linux-2.6.orig/drivers/base/power/runtime.c
+++ linux-2.6/drivers/base/power/runtime.c
@@ -135,8 +135,9 @@ static int rpm_check_suspend_allowed(str
 
 	if (dev->power.runtime_error)
 		retval = -EINVAL;
-	else if (atomic_read(&dev->power.usage_count) > 0
-	    || dev->power.disable_depth > 0)
+	else if (dev->power.disable_depth > 0)
+		retval = -EACCES;
+	else if (atomic_read(&dev->power.usage_count) > 0)
 		retval = -EAGAIN;
 	else if (!pm_children_suspended(dev))
 		retval = -EBUSY;
@@ -262,7 +263,7 @@ static int rpm_callback(int (*cb)(struct
 		spin_lock_irq(&dev->power.lock);
 	}
 	dev->power.runtime_error = retval;
-	return retval;
+	return retval != -EACCES ? retval : -EIO;
 }
 
 /**
@@ -458,7 +459,7 @@ static int rpm_resume(struct device *dev
 	if (dev->power.runtime_error)
 		retval = -EINVAL;
 	else if (dev->power.disable_depth > 0)
-		retval = -EAGAIN;
+		retval = -EACCES;
 	if (retval)
 		goto out;
 
Index: linux-2.6/drivers/scsi/scsi_pm.c
===================================================================
--- linux-2.6.orig/drivers/scsi/scsi_pm.c
+++ linux-2.6/drivers/scsi/scsi_pm.c
@@ -144,9 +144,9 @@ int scsi_autopm_get_device(struct scsi_d
 	int	err;
 
 	err = pm_runtime_get_sync(&sdev->sdev_gendev);
-	if (err < 0)
+	if (err < 0 && err !=-EACCES)
 		pm_runtime_put_sync(&sdev->sdev_gendev);
-	else if (err > 0)
+	else
 		err = 0;
 	return err;
 }
@@ -173,9 +173,9 @@ int scsi_autopm_get_host(struct Scsi_Hos
 	int	err;
 
 	err = pm_runtime_get_sync(&shost->shost_gendev);
-	if (err < 0)
+	if (err < 0 && err !=-EACCES)
 		pm_runtime_put_sync(&shost->shost_gendev);
-	else if (err > 0)
+	else
 		err = 0;
 	return err;
 }

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-23 22:59                             ` Rafael J. Wysocki
@ 2011-06-26  2:39                               ` Alan Stern
  2011-06-26 12:22                                   ` Rafael J. Wysocki
  2011-06-26  2:39                               ` Alan Stern
  1 sibling, 1 reply; 36+ messages in thread
From: Alan Stern @ 2011-06-26  2:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM mailing list, LKML, Jesse Barnes, linux-pci, Tejun Heo

On Fri, 24 Jun 2011, Rafael J. Wysocki wrote:

> > > I'm still not clear on why the error handler needs to run at this time.
> > 
> > Because SATA ports are suspended with the help of the SCSI error handling
> > mechanism (which Tejun claims is the best way to do that).

> I've carried out this exercise to see how complicated it is going to be
> and it doesn't really seem to be _that_ complicated.  The appended patch
> illustrates this, but it hasn't been tested, so caveat emptor.

The patch is straightforward enough.  But will it be sufficient?

Suppose a SATA port is already in runtime suspend when the system sleep
starts.  Will the error handler be able to do its special job?  I don't
know...  It may turn out to be necessary for the SATA port to be
runtime resumed somewhere along the line.

Alan Stern


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-23 22:59                             ` Rafael J. Wysocki
  2011-06-26  2:39                               ` Alan Stern
@ 2011-06-26  2:39                               ` Alan Stern
  1 sibling, 0 replies; 36+ messages in thread
From: Alan Stern @ 2011-06-26  2:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Tejun Heo, linux-pci, Linux PM mailing list, LKML, Jesse Barnes

On Fri, 24 Jun 2011, Rafael J. Wysocki wrote:

> > > I'm still not clear on why the error handler needs to run at this time.
> > 
> > Because SATA ports are suspended with the help of the SCSI error handling
> > mechanism (which Tejun claims is the best way to do that).

> I've carried out this exercise to see how complicated it is going to be
> and it doesn't really seem to be _that_ complicated.  The appended patch
> illustrates this, but it hasn't been tested, so caveat emptor.

The patch is straightforward enough.  But will it be sufficient?

Suppose a SATA port is already in runtime suspend when the system sleep
starts.  Will the error handler be able to do its special job?  I don't
know...  It may turn out to be necessary for the SATA port to be
runtime resumed somewhere along the line.

Alan Stern

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
  2011-06-26  2:39                               ` Alan Stern
@ 2011-06-26 12:22                                   ` Rafael J. Wysocki
  0 siblings, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-06-26 12:22 UTC (permalink / raw)
  To: Alan Stern
  Cc: Linux PM mailing list, LKML, Jesse Barnes, linux-pci, Tejun Heo

On Sunday, June 26, 2011, Alan Stern wrote:
> On Fri, 24 Jun 2011, Rafael J. Wysocki wrote:
> 
> > > > I'm still not clear on why the error handler needs to run at this time.
> > > 
> > > Because SATA ports are suspended with the help of the SCSI error handling
> > > mechanism (which Tejun claims is the best way to do that).
> 
> > I've carried out this exercise to see how complicated it is going to be
> > and it doesn't really seem to be _that_ complicated.  The appended patch
> > illustrates this, but it hasn't been tested, so caveat emptor.
> 
> The patch is straightforward enough.  But will it be sufficient?
> 
> Suppose a SATA port is already in runtime suspend when the system sleep
> starts.  Will the error handler be able to do its special job?  I don't
> know...  It may turn out to be necessary for the SATA port to be
> runtime resumed somewhere along the line.

That's correct, but at least in the ususal situation (i.e. sdev_gendev is
not suspended at run time) the patch makes things work when runtime PM
is disabled during system suspend and enabled during system resume.

It still may be necessary to add code to the SATA subsystem if sdev_gendev
is to be suspended at run time for SATA controllers.  I'm not aware of any
of such cases, though.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep
@ 2011-06-26 12:22                                   ` Rafael J. Wysocki
  0 siblings, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-06-26 12:22 UTC (permalink / raw)
  To: Alan Stern
  Cc: Tejun Heo, linux-pci, Linux PM mailing list, LKML, Jesse Barnes

On Sunday, June 26, 2011, Alan Stern wrote:
> On Fri, 24 Jun 2011, Rafael J. Wysocki wrote:
> 
> > > > I'm still not clear on why the error handler needs to run at this time.
> > > 
> > > Because SATA ports are suspended with the help of the SCSI error handling
> > > mechanism (which Tejun claims is the best way to do that).
> 
> > I've carried out this exercise to see how complicated it is going to be
> > and it doesn't really seem to be _that_ complicated.  The appended patch
> > illustrates this, but it hasn't been tested, so caveat emptor.
> 
> The patch is straightforward enough.  But will it be sufficient?
> 
> Suppose a SATA port is already in runtime suspend when the system sleep
> starts.  Will the error handler be able to do its special job?  I don't
> know...  It may turn out to be necessary for the SATA port to be
> runtime resumed somewhere along the line.

That's correct, but at least in the ususal situation (i.e. sdev_gendev is
not suspended at run time) the patch makes things work when runtime PM
is disabled during system suspend and enabled during system resume.

It still may be necessary to add code to the SATA subsystem if sdev_gendev
is to be suspended at run time for SATA controllers.  I'm not aware of any
of such cases, though.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] PCI / PM: Block races between runtime PM and system sleep
@ 2011-06-19 19:49 Rafael J. Wysocki
  0 siblings, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-06-19 19:49 UTC (permalink / raw)
  To: Linux PM mailing list; +Cc: linux-pci, LKML, Jesse Barnes

From: Rafael J. Wysocki <rjw@sisk.pl>

After commit e8665002477f0278f84f898145b1f141ba26ee26
(PM: Allow pm_runtime_suspend() to succeed during system suspend) it
is possible that a device resumed by the pm_runtime_resume(dev) in
pci_pm_prepare() will be suspended immediately from a work item,
timer function or otherwise, defeating the very purpose of calling
pm_runtime_resume(dev) from there.  To prevent that from happening
it is necessary to increment the runtime PM usage counter of the
device by replacing pm_runtime_resume() with pm_runtime_get_sync().
Moreover, the incremented runtime PM usage counter has to be
decremented by the corresponding pci_pm_complete(), via
pm_runtime_put_noidle().

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: stable@kernel.org
---
 drivers/pci/pci-driver.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -624,7 +624,7 @@ static int pci_pm_prepare(struct device
 	 * system from the sleep state, we'll have to prevent it from signaling
 	 * wake-up.
 	 */
-	pm_runtime_resume(dev);
+	pm_runtime_get_sync(dev);

 	if (drv && drv->pm && drv->pm->prepare)
 		error = drv->pm->prepare(dev);
@@ -638,6 +638,8 @@ static void pci_pm_complete(struct devic

 	if (drv && drv->pm && drv->pm->complete)
 		drv->pm->complete(dev);
+
+	pm_runtime_put_noidle(dev);
 }

 #else /* !CONFIG_PM_SLEEP */

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2011-06-26 12:23 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-19 19:49 [PATCH] PCI / PM: Block races between runtime PM and system sleep Rafael J. Wysocki
2011-06-20 14:46 ` Alan Stern
2011-06-20 14:46 ` Alan Stern
2011-06-20 19:42   ` Rafael J. Wysocki
2011-06-20 19:42   ` Rafael J. Wysocki
2011-06-20 21:00     ` Alan Stern
2011-06-20 21:28       ` Rafael J. Wysocki
2011-06-20 21:28       ` Rafael J. Wysocki
2011-06-21 14:52         ` Alan Stern
2011-06-21 14:52         ` Alan Stern
2011-06-21 23:49           ` Rafael J. Wysocki
2011-06-21 23:49           ` Rafael J. Wysocki
2011-06-22 14:20             ` Alan Stern
2011-06-22 14:20             ` Alan Stern
2011-06-23 17:46               ` Rafael J. Wysocki
2011-06-23 17:46               ` Rafael J. Wysocki
2011-06-23 18:35                 ` Alan Stern
2011-06-23 20:49                   ` Rafael J. Wysocki
2011-06-23 20:49                   ` Rafael J. Wysocki
2011-06-23 21:02                     ` Alan Stern
2011-06-23 21:02                     ` Alan Stern
2011-06-23 21:16                       ` Rafael J. Wysocki
2011-06-23 21:16                       ` Rafael J. Wysocki
2011-06-23 21:38                         ` Alan Stern
2011-06-23 22:35                           ` Rafael J. Wysocki
2011-06-23 22:35                           ` Rafael J. Wysocki
2011-06-23 22:59                             ` Rafael J. Wysocki
2011-06-23 22:59                             ` Rafael J. Wysocki
2011-06-26  2:39                               ` Alan Stern
2011-06-26 12:22                                 ` Rafael J. Wysocki
2011-06-26 12:22                                   ` Rafael J. Wysocki
2011-06-26  2:39                               ` Alan Stern
2011-06-23 21:38                         ` Alan Stern
2011-06-23 18:35                 ` Alan Stern
2011-06-20 21:00     ` Alan Stern
2011-06-19 19:49 Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.