* [GIT PULL] PM updates for 2.6.33 @ 2009-12-05 21:16 Rafael J. Wysocki 2009-12-05 21:43 ` Linus Torvalds 0 siblings, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-05 21:16 UTC (permalink / raw) To: Linus Torvalds; +Cc: LKML, ACPI Devel Maling List, pm list Hi Linus, Please pull power management updates for 2.6.33 from: git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6.git for-linus They include: * Asynchronous suspend and resume infrastructure. For now, PCI, ACPI and serio devices are enabled to suspend and resume asynchronously. * Fixes for the runtime PM framework. * Hibernate cleanups from Nigel and Jiri Slaby. * Freezer optimisation from Tejun. Documentation/power/runtime_pm.txt | 12 +- drivers/acpi/glue.c | 3 + drivers/acpi/scan.c | 1 + drivers/base/core.c | 4 + drivers/base/power/Makefile | 2 +- drivers/base/power/common.c | 283 +++++++++++++++ drivers/base/power/main.c | 677 +++++++++++++++++++++++++++++++++--- drivers/base/power/power.h | 42 ++- drivers/base/power/runtime.c | 27 +- drivers/base/power/sysfs.c | 47 +++ drivers/input/serio/serio.c | 1 + drivers/pci/pci.c | 1 + drivers/pci/pcie/portdrv_core.c | 1 + include/linux/device.h | 11 + include/linux/pm.h | 21 +- include/linux/pm_link.h | 30 ++ include/linux/pm_runtime.h | 12 + include/linux/resume-trace.h | 7 + kernel/power/Kconfig | 14 + kernel/power/Makefile | 2 +- kernel/power/hibernate.c | 26 ++ kernel/power/main.c | 32 ++- kernel/power/process.c | 14 +- kernel/power/swap.c | 107 ++++++- kernel/power/swsusp.c | 188 ---------- 25 files changed, 1281 insertions(+), 284 deletions(-) --------------- Alan Stern (2): PM / Runtime: Export the PM runtime workqueue PM / Runtime: Use deferred_resume flag in pm_request_resume Jaswinder Singh Rajput (1): PM: Fix kernel-doc notation Jiri Slaby (1): PM / Hibernate: Swap, use KERN_CONT Nigel Cunningham (2): PM / Hibernate: Move swap functions to kernel/power/swap.c. PM / Hibernate: Shift remaining code from swsusp.c to hibernate.c Rafael J. Wysocki (15): PM: Introduce PM links framework PM: Asynchronous resume of devices PM: Asynchronous suspend of devices PM: Allow PCI devices to suspend/resume asynchronously PM: Allow ACPI devices to suspend/resume asynchronously PM: Add a switch for disabling/enabling asynchronous suspend/resume PM: Measure device suspend and resume times PM: Add facility for advanced testing of async suspend/resume PM: Measure suspend and resume times for individual devices PM: Allow serio input devices to suspend/resume asynchronously PM / Runtime: Fix lockdep warning in __pm_runtime_set_status() PM / Runtime: Ensure timer_expires is nonzero in pm_schedule_suspend() PM / Runtime: Make documentation of runtime_idle() agree with the code PM / Runtime: Remove unnecessary braces in __pm_runtime_set_status() PM: Add flag for devices capable of generating run-time wake-up events Stephen Rothwell (1): PM / Suspend: Using TASK_ macros requires sched.h Tejun Heo (1): PM / freezer: Don't get over-anxious while waiting ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-05 21:16 [GIT PULL] PM updates for 2.6.33 Rafael J. Wysocki @ 2009-12-05 21:43 ` Linus Torvalds 2009-12-05 21:58 ` Linus Torvalds 2009-12-06 0:29 ` Rafael J. Wysocki 0 siblings, 2 replies; 235+ messages in thread From: Linus Torvalds @ 2009-12-05 21:43 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: LKML, ACPI Devel Maling List, pm list On Sat, 5 Dec 2009, Rafael J. Wysocki wrote: > > * Asynchronous suspend and resume infrastructure. For now, PCI, ACPI and > serio devices are enabled to suspend and resume asynchronously. I really think this is totally and utterly broken. Both from an implementation standpoint _and_ from a pure conceptual one. Why isn't the suspend/resume async stuff just done like the init async stuff? We don't need that crazy per-device flag for initialization, neither do we need drivers "enabling" any async code at all. They just do some things asynchronously, and then at the end of init time we wait for all those async events. So why does suspend/resume need to do crazy sh*t instead? It all looks terminally broken: you force async suspend for all PCI drivers, even when it makes no sense. Rather than let the drivers that already know how to do things like disk spinup asynchronously just do it that way. The "timing" routines are also just crazy. What is the excuse for dpm_show_time() taking both start and stop times, since there is never any valid situation when it shouldn't have that do_gettimgofday(&stop) just before it? IOW - the whole end-time thing should be _inside_ dpm_show_time, rather than being done by the caller. No? In other words - I'm not pulling this crazy thing. You'd better explain why it was done that way, when we already have done the same things better before in different ways. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-05 21:43 ` Linus Torvalds @ 2009-12-05 21:58 ` Linus Torvalds 2009-12-05 23:55 ` Rafael J. Wysocki 2009-12-06 0:29 ` Rafael J. Wysocki 1 sibling, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-05 21:58 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: LKML, ACPI Devel Maling List, pm list On Sat, 5 Dec 2009, Linus Torvalds wrote: > > In other words - I'm not pulling this crazy thing. You'd better explain > why it was done that way, when we already have done the same things better > before in different ways. I get the feeling that all the crazy infrastructure was due to worrying about the suspend/resume topology. But the reason we don't worry about that during init is that it doesn't really tend to matter. Most slow operations are the things that aren't topology-aware, ie things like spinning up/down disks etc, that really could be done as a separate phase instead. For example, is there really any reason why resume doesn't look exactly like the init sequence? Drivers that do slow things can start async work to do them, and then at the end of the resume sequence we just do a "wait for all the async work", exactly like we do for the current init sequences. And yes, for the suspend sequence we obviously need to do any async work (and wait for it) before we actually shut down the controllers, but that would be _way_ more natural to do by just introducing a "pre-suspend" hook that walks the device tree and does any async stuff. And then just wait for the async stuff to finish before doing the suspend, and perhaps again before doing late_suspend (maybe somebody wants to do async stuff at the second stage too). Then, because we need a way to undo things if things go wrong in the middle (and because it's also nice to be symmetric), we'd probably want to introduce that kind of "post_resume()" callback that allows you have a separate async wakeup thing for resume time too. What are actually the expensive/slow things during suspend/resume? Am I wrong when I say it's things like disk spinup/spindown (and USB discovery, which needs USB-level support anyway, since it can involve devices that we didn't even know about before discovery started). Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-05 21:58 ` Linus Torvalds @ 2009-12-05 23:55 ` Rafael J. Wysocki 2009-12-06 0:45 ` Arjan van de Ven 2009-12-06 0:48 ` Linus Torvalds 0 siblings, 2 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-05 23:55 UTC (permalink / raw) To: Linus Torvalds; +Cc: LKML, ACPI Devel Maling List, pm list, Alan Stern On Saturday 05 December 2009, Linus Torvalds wrote: > > On Sat, 5 Dec 2009, Linus Torvalds wrote: > > > > In other words - I'm not pulling this crazy thing. You'd better explain > > why it was done that way, when we already have done the same things better > > before in different ways. OK, I'll send another pull request without these patches if the rest of the changes if fine with you (they are more important than the async stuff to me). > I get the feeling that all the crazy infrastructure was due to worrying > about the suspend/resume topology. Yes, that's the main reason. > But the reason we don't worry about that during init is that it doesn't > really tend to matter. Most slow operations are the things that aren't > topology-aware, ie things like spinning up/down disks etc, that really > could be done as a separate phase instead. It was based on the observation that in many cases the current drivers' suspend and resume callbacks can be run in parallel with the other drivers' callbacks without any changes to the drivers (and without introducing another phase of suspend for that matter), because there are no dependencies between them. The approach you're suggesting would require modifying individual drivers which I just wanted to avoid. If you don't like that, we'll have to take the longer route, although I'm afraid that will take lots of time and we won't be able to exploit the entire possible parallelism this way. > For example, is there really any reason why resume doesn't look exactly > like the init sequence? Drivers that do slow things can start async work > to do them, and then at the end of the resume sequence we just do a "wait > for all the async work", exactly like we do for the current init > sequences. During suspend we actually know what the dependences between the devicces are and we can use that information to do more things in parallel. For instance, in the majority of cases (I'm yet to find a counter example), the entire suspend callbacks of "leaf" PCI devices may be run in parallel with each other. So, the point is not to look for "async stuff" in a driver's suspend/resume callbacks, but to execute the whole suspend/resume callbacks in parallel, if possible. > And yes, for the suspend sequence we obviously need to do any async work > (and wait for it) before we actually shut down the controllers, but that > would be _way_ more natural to do by just introducing a "pre-suspend" hook > that walks the device tree and does any async stuff. And then just wait > for the async stuff to finish before doing the suspend, and perhaps again > before doing late_suspend (maybe somebody wants to do async stuff at the > second stage too). > > Then, because we need a way to undo things if things go wrong in the > middle (and because it's also nice to be symmetric), we'd probably want to > introduce that kind of "post_resume()" callback that allows you have a > separate async wakeup thing for resume time too. Yes, we can do that, but I'm afraid that the majority of drivers won't use the new hooks (people generally seem to be to reluctant to modify their suspend/resume callbacks not to break things). Also, for an individual driver it really is difficult to separate the "async stuff" from the stuff which is not async, because everything that can be done in parallel with the other drivers' suspend callbacks is potentially async, as long as there are no dependences between the devices in question (like parent-child dependences, or PCI-shadow ACPI dependences). And it's generally worth doing that if a driver's suspend or resume callback calls msleep() for whatever the reason. > What are actually the expensive/slow things during suspend/resume? Am I > wrong when I say it's things like disk spinup/spindown (and USB discovery, > which needs USB-level support anyway, since it can involve devices that we > didn't even know about before discovery started). Disk spinup/spindown takes time, but also some ACPI devices resume slowly, serio devices do that too and there are surprisingly many drivers that wait (using msleep() during suspend and resume). Apart from this, every PCI device going from D0 to D3 during suspend and from D3 to D0 during resume requires us to sleep for 10 ms (the sleeping is done by the PCI core, so the drivers don't even realize its there). Thanks, Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-05 23:55 ` Rafael J. Wysocki @ 2009-12-06 0:45 ` Arjan van de Ven 2009-12-06 1:26 ` Rafael J. Wysocki 2009-12-06 0:48 ` Linus Torvalds 1 sibling, 1 reply; 235+ messages in thread From: Arjan van de Ven @ 2009-12-06 0:45 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, LKML, ACPI Devel Maling List, pm list, Alan Stern On Sun, 6 Dec 2009 00:55:36 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > Disk spinup/spindown takes time, but also some ACPI devices resume > slowly, serio devices do that too and there are surprisingly many > drivers that wait (using msleep() during suspend and resume). Apart > from this, every PCI device going from D0 to D3 during suspend and > from D3 to D0 during resume requires us to sleep for 10 ms (the > sleeping is done by the PCI core, so the drivers don't even realize > its there). maybe a good step is to make a scripts/bootgraph.pl equivalent for suspend/resume (or make a debug mode that outputs in a compatible format so that the script can be used as is.. I don't mind either way, and consider this my offer to help with such a script as long as there's sufficient logging in dmesg ;-) that way we can SEE which ones are an issue.... and by how much. -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-06 0:45 ` Arjan van de Ven @ 2009-12-06 1:26 ` Rafael J. Wysocki 2009-12-06 1:58 ` Arjan van de Ven 0 siblings, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-06 1:26 UTC (permalink / raw) To: Arjan van de Ven Cc: Linus Torvalds, LKML, ACPI Devel Maling List, pm list, Alan Stern On Sunday 06 December 2009, Arjan van de Ven wrote: > On Sun, 6 Dec 2009 00:55:36 +0100 > "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > > > > Disk spinup/spindown takes time, but also some ACPI devices resume > > slowly, serio devices do that too and there are surprisingly many > > drivers that wait (using msleep() during suspend and resume). Apart > > from this, every PCI device going from D0 to D3 during suspend and > > from D3 to D0 during resume requires us to sleep for 10 ms (the > > sleeping is done by the PCI core, so the drivers don't even realize > > its there). > > maybe a good step is to make a scripts/bootgraph.pl equivalent for > suspend/resume (or make a debug mode that outputs in a compatible format > so that the script can be used as is.. I don't mind either way, and > consider this my offer to help with such a script as long as there's > sufficient logging in dmesg ;-) OK, so what kind of logging is needed? > that way we can SEE which ones are an issue.... and by how much. Well, why not. Thanks, Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-06 1:26 ` Rafael J. Wysocki @ 2009-12-06 1:58 ` Arjan van de Ven 2009-12-06 8:39 ` Ingo Molnar 0 siblings, 1 reply; 235+ messages in thread From: Arjan van de Ven @ 2009-12-06 1:58 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, LKML, ACPI Devel Maling List, pm list, Alan Stern On Sun, 6 Dec 2009 02:26:06 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > On Sunday 06 December 2009, Arjan van de Ven wrote: > > On Sun, 6 Dec 2009 00:55:36 +0100 > > "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > > > > > > > Disk spinup/spindown takes time, but also some ACPI devices resume > > > slowly, serio devices do that too and there are surprisingly many > > > drivers that wait (using msleep() during suspend and resume). > > > Apart from this, every PCI device going from D0 to D3 during > > > suspend and from D3 to D0 during resume requires us to sleep for > > > 10 ms (the sleeping is done by the PCI core, so the drivers don't > > > even realize its there). > > > > maybe a good step is to make a scripts/bootgraph.pl equivalent for > > suspend/resume (or make a debug mode that outputs in a compatible > > format so that the script can be used as is.. I don't mind either > > way, and consider this my offer to help with such a script as long > > as there's sufficient logging in dmesg ;-) > > OK, so what kind of logging is needed? basically the equivalent of the two initcall_debug paths in init/main.c:do_one_initcall() which prints a start time and an end time (and a pid) for each init function; if we have the same for suspend calls (and resume)... we can make the tool graph it. Would be nice to get markers for start and end of the whole suspend sequence as well as for the resume sequence; those make it easier to know when the end is (so that the axis can be drawn etc) shouldn't be too hard to implement... -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-06 1:58 ` Arjan van de Ven @ 2009-12-06 8:39 ` Ingo Molnar 0 siblings, 0 replies; 235+ messages in thread From: Ingo Molnar @ 2009-12-06 8:39 UTC (permalink / raw) To: Arjan van de Ven Cc: Rafael J. Wysocki, Linus Torvalds, LKML, ACPI Devel Maling List, pm list, Alan Stern * Arjan van de Ven <arjan@infradead.org> wrote: > On Sun, 6 Dec 2009 02:26:06 +0100 > "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > > On Sunday 06 December 2009, Arjan van de Ven wrote: > > > On Sun, 6 Dec 2009 00:55:36 +0100 > > > "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > > > > > > > > > > Disk spinup/spindown takes time, but also some ACPI devices resume > > > > slowly, serio devices do that too and there are surprisingly many > > > > drivers that wait (using msleep() during suspend and resume). > > > > Apart from this, every PCI device going from D0 to D3 during > > > > suspend and from D3 to D0 during resume requires us to sleep for > > > > 10 ms (the sleeping is done by the PCI core, so the drivers don't > > > > even realize its there). > > > > > > maybe a good step is to make a scripts/bootgraph.pl equivalent for > > > suspend/resume (or make a debug mode that outputs in a compatible > > > format so that the script can be used as is.. I don't mind either > > > way, and consider this my offer to help with such a script as long > > > as there's sufficient logging in dmesg ;-) > > > > OK, so what kind of logging is needed? > > basically the equivalent of the two initcall_debug paths in > > init/main.c:do_one_initcall() > > which prints a start time and an end time (and a pid) for each init > function; if we have the same for suspend calls (and resume)... we can > make the tool graph it. > > Would be nice to get markers for start and end of the whole suspend > sequence as well as for the resume sequence; those make it easier to > know when the end is (so that the axis can be drawn etc) > > shouldn't be too hard to implement... I think an even better option would be to extend 'perf timechart' to be suspend/resume aware: add a few tracepoint events and teach 'perf timechart' to draw them. (We should be able to do perf timechart record across suspend/resume cycles just fine.) ( Doing that would also improve the tracing facilities within suspend/resume quite significantly. It wouldnt just be a single-purpose thing for graphing, but perf trace and perf stat would work equally well. ) Thanks, Ingo ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-05 23:55 ` Rafael J. Wysocki 2009-12-06 0:45 ` Arjan van de Ven @ 2009-12-06 0:48 ` Linus Torvalds 2009-12-06 1:54 ` Rafael J. Wysocki 1 sibling, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-06 0:48 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: LKML, ACPI Devel Maling List, pm list, Alan Stern On Sun, 6 Dec 2009, Rafael J. Wysocki wrote: > > The approach you're suggesting would require modifying individual drivers which > I just wanted to avoid. In the init path, we had the reverse worry - not wanting to make everything (where "everything" can be some subsystem like just the set of PCI drivers, of course - not really "everything" in an absolute sense) async, and then having to try to work out with the random driver that couldn't handle it. And there were _lots_ of drivers that couldn't handle it, because they knew they got woken up serially. The ATA layer needed to know about asynchronous things, because sometimes those independent devices aren't so independent at all. Which is why I don't think your approach is safe. Just to take an example of the whole "independent devices are not necessarily independent" thing - things like multi-port PCMCIA controllers generally show up as multiple PCI devices. But they are _not_ independent, and they actually share some registers. Resuming them asynchronously might well be ok, but maybe it's not. Who knows? In contrast, a device driver can generally know that certain _parts_ of the initialization is safe. As an example of that, I think the libata layer does all the port enumeration synchronously, but then once the ports have been identified, it does the rest async. That's the kind of decision we can sanely make when we do the async part as a "drivers may choose to do certain parts asynchronously". Doing it at a higher level sounds like a problem to me. > If you don't like that, we'll have to take the longer route, although > I'm afraid that will take lots of time and we won't be able to exploit > the entire possible parallelism this way. Sure. But I'd rather do the safe thing. Especially since there are likely just a few cases that really take a long time. > During suspend we actually know what the dependences between the devicces > are and we can use that information to do more things in parallel. For > instance, in the majority of cases (I'm yet to find a counter example), the > entire suspend callbacks of "leaf" PCI devices may be run in parallel with each > other. See above. That's simply not at all guaranteed to be true. And when it isn't true (ie different PCI leaf devices end up having subtle dependencies), now you need to start doing hacky things. I'd much rather have the individual drivers say "I can do this part in parallel", and not force it on them. Because it is definitely _not_ guaranteed that PCI devices can do parallel resume and suspend. > Yes, we can do that, but I'm afraid that the majority of drivers won't use the > new hooks (people generally seem to be to reluctant to modify their > suspend/resume callbacks not to break things). See above - I don't think this is a "majority" issue. I think it's a "let's figure out the problem spots, and fix _those_". IOW, get 2% of the coverage, and get 95% of the advantage. > Disk spinup/spindown takes time, but also some ACPI devices resume slowly, We actually saw that when we did async init. And it was horrible. There's nothing that says that the ACPI stuff necessarily even _can_ run in parallel. I think we currently only do the ACPI battery ops asynchronously. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-06 0:48 ` Linus Torvalds @ 2009-12-06 1:54 ` Rafael J. Wysocki 2009-12-06 1:57 ` Rafael J. Wysocki 2009-12-06 2:05 ` Linus Torvalds 0 siblings, 2 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-06 1:54 UTC (permalink / raw) To: Linus Torvalds; +Cc: LKML, ACPI Devel Maling List, pm list, Alan Stern On Sunday 06 December 2009, Linus Torvalds wrote: > > On Sun, 6 Dec 2009, Rafael J. Wysocki wrote: > > > > The approach you're suggesting would require modifying individual drivers which > > I just wanted to avoid. > > In the init path, we had the reverse worry - not wanting to make > everything (where "everything" can be some subsystem like just the set of > PCI drivers, of course - not really "everything" in an absolute sense) > async, and then having to try to work out with the random driver that > couldn't handle it. > > And there were _lots_ of drivers that couldn't handle it, because they > knew they got woken up serially. The ATA layer needed to know about > asynchronous things, because sometimes those independent devices aren't so > independent at all. Which is why I don't think your approach is safe. While the current settings are probably unsafe (like enabling PCI devices to be suspended asynchronously by default if there are not any direct dependences between them), there are provisions to make eveything safe, if we have enough information (which also is needed to put the required logic into the drivers). The device tree represents a good deal of the dependences between devices and the other dependences may be represented as PM links enforcing specific ordering of the PM callbacks. > Just to take an example of the whole "independent devices are not > necessarily independent" thing - things like multi-port PCMCIA controllers > generally show up as multiple PCI devices. But they are _not_ independent, > and they actually share some registers. Resuming them asynchronously might > well be ok, but maybe it's not. Who knows? I'd say if there's a worry that the same register may be accessed concurrently from two different code paths, there should be some locking in place. > In contrast, a device driver can generally know that certain _parts_ of > the initialization is safe. As an example of that, I think the libata > layer does all the port enumeration synchronously, but then once the ports > have been identified, it does the rest async. > > That's the kind of decision we can sanely make when we do the async part > as a "drivers may choose to do certain parts asynchronously". Doing it at > a higher level sounds like a problem to me. The difference between suspend and initialization is that during suspend we have already enumerated all devices and we should know how they depend on each other (and we really should know that if we are to actually understand how things work), so we can represent that information somehow and use it to do things at the higher level. How to represent it is a different matter, but in principle it should be possible. > > If you don't like that, we'll have to take the longer route, although > > I'm afraid that will take lots of time and we won't be able to exploit > > the entire possible parallelism this way. > > Sure. But I'd rather do the safe thing. Especially since there are likely > just a few cases that really take a long time. And there are lots of small sleeps here and there that accumulate and are entirely avoidable. > > During suspend we actually know what the dependences between the devicces > > are and we can use that information to do more things in parallel. For > > instance, in the majority of cases (I'm yet to find a counter example), the > > entire suspend callbacks of "leaf" PCI devices may be run in parallel with each > > other. > > See above. That's simply not at all guaranteed to be true. > > And when it isn't true (ie different PCI leaf devices end up having subtle > dependencies), now you need to start doing hacky things. > > I'd much rather have the individual drivers say "I can do this part in > parallel", and not force it on them. Because it is definitely _not_ > guaranteed that PCI devices can do parallel resume and suspend. OK, it's not guaranteed, but why not to do this on systems where it's known to work? > > Yes, we can do that, but I'm afraid that the majority of drivers won't use the > > new hooks (people generally seem to be to reluctant to modify their > > suspend/resume callbacks not to break things). > > See above - I don't think this is a "majority" issue. I think it's a > "let's figure out the problem spots, and fix _those_". IOW, get 2% of the > coverage, and get 95% of the advantage. I wouldn't really like to add even more suspend/resume callbacks for this purpose, because we already have so many of them. And even if we do that, I don't really expect drivers to start using them any time soon. > > Disk spinup/spindown takes time, but also some ACPI devices resume slowly, > > We actually saw that when we did async init. And it was horrible. There's > nothing that says that the ACPI stuff necessarily even _can_ run in > parallel. > > I think we currently only do the ACPI battery ops asynchronously. There are only a few ACPI devices that have real suspend/resume callbacks and I haven't see problems with these in practice. Thanks, Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-06 1:54 ` Rafael J. Wysocki @ 2009-12-06 1:57 ` Rafael J. Wysocki 2009-12-06 2:05 ` Linus Torvalds 1 sibling, 0 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-06 1:57 UTC (permalink / raw) To: Linus Torvalds; +Cc: LKML, ACPI Devel Maling List, pm list, Alan Stern On Sunday 06 December 2009, Rafael J. Wysocki wrote: > On Sunday 06 December 2009, Linus Torvalds wrote: > > > > On Sun, 6 Dec 2009, Rafael J. Wysocki wrote: > > > > > > The approach you're suggesting would require modifying individual drivers which > > > I just wanted to avoid. > > > > In the init path, we had the reverse worry - not wanting to make > > everything (where "everything" can be some subsystem like just the set of > > PCI drivers, of course - not really "everything" in an absolute sense) > > async, and then having to try to work out with the random driver that > > couldn't handle it. > > > > And there were _lots_ of drivers that couldn't handle it, because they > > knew they got woken up serially. The ATA layer needed to know about > > asynchronous things, because sometimes those independent devices aren't so > > independent at all. Which is why I don't think your approach is safe. > > While the current settings are probably unsafe (like enabling PCI devices > to be suspended asynchronously by default if there are not any direct > dependences between them), there are provisions to make eveything safe, if > we have enough information (which also is needed to put the required logic into > the drivers). The device tree represents a good deal of the dependences > between devices and the other dependences may be represented as PM links > enforcing specific ordering of the PM callbacks. > > > Just to take an example of the whole "independent devices are not > > necessarily independent" thing - things like multi-port PCMCIA controllers > > generally show up as multiple PCI devices. But they are _not_ independent, > > and they actually share some registers. Resuming them asynchronously might > > well be ok, but maybe it's not. Who knows? > > I'd say if there's a worry that the same register may be accessed concurrently > from two different code paths, there should be some locking in place. > > > In contrast, a device driver can generally know that certain _parts_ of > > the initialization is safe. As an example of that, I think the libata > > layer does all the port enumeration synchronously, but then once the ports > > have been identified, it does the rest async. > > > > That's the kind of decision we can sanely make when we do the async part > > as a "drivers may choose to do certain parts asynchronously". Doing it at > > a higher level sounds like a problem to me. > > The difference between suspend and initialization is that during suspend we > have already enumerated all devices and we should know how they depend on > each other (and we really should know that if we are to actually understand how > things work), so we can represent that information somehow and use it to do > things at the higher level. > > How to represent it is a different matter, but in principle it should be > possible. > > > > If you don't like that, we'll have to take the longer route, although > > > I'm afraid that will take lots of time and we won't be able to exploit > > > the entire possible parallelism this way. > > > > Sure. But I'd rather do the safe thing. Especially since there are likely > > just a few cases that really take a long time. > > And there are lots of small sleeps here and there that accumulate and are > entirely avoidable. I mean, it is avoidable to do all these sleeps sequentially. Thanks, Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-06 1:54 ` Rafael J. Wysocki 2009-12-06 1:57 ` Rafael J. Wysocki @ 2009-12-06 2:05 ` Linus Torvalds 2009-12-06 2:36 ` Rafael J. Wysocki ` (2 more replies) 1 sibling, 3 replies; 235+ messages in thread From: Linus Torvalds @ 2009-12-06 2:05 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: LKML, ACPI Devel Maling List, pm list, Alan Stern On Sun, 6 Dec 2009, Rafael J. Wysocki wrote: > > While the current settings are probably unsafe (like enabling PCI devices > to be suspended asynchronously by default if there are not any direct > dependences between them), there are provisions to make eveything safe, if > we have enough information (which also is needed to put the required logic into > the drivers). I disagree. Think of a situation that we already handle pretty poorly: USB mass storage devices over a suspend/resume. > The device tree represents a good deal of the dependences > between devices and the other dependences may be represented as PM links > enforcing specific ordering of the PM callbacks. The device tree means nothing at all, because it may need to be entirely rebuilt at resume time. Optimally, what we _should_ be doing (and aren't) for suspend/resume of USB is to just tear down the whole topology and rebuild it and re-connect the things like mass storage devices. IOW, there would be no device tree to describe the topology, because we're finding it anew. And it's one of the things we _would_ want to do asynchronously with other things. We don't want to build up some irrelevant PM links and callbacks. We don't want to have some completely made-up new infrastructure for something that we _already_ want to handle totally differently for init time. IOW, I argue very strongly against making up something PM-specific, when there really doesn't seem to be much of an advantage. We're much better off trying to share the init code than making up something new. > I'd say if there's a worry that the same register may be accessed concurrently > from two different code paths, there should be some locking in place. Yeah. And I wish ACPI didn't exist at all. We don't know. And we want to _limit_ our exposure to these things. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-06 2:05 ` Linus Torvalds @ 2009-12-06 2:36 ` Rafael J. Wysocki 2009-12-06 15:23 ` Alan Stern 2009-12-06 19:35 ` Arjan van de Ven 2 siblings, 0 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-06 2:36 UTC (permalink / raw) To: Linus Torvalds; +Cc: LKML, ACPI Devel Maling List, pm list, Alan Stern On Sunday 06 December 2009, Linus Torvalds wrote: > > On Sun, 6 Dec 2009, Rafael J. Wysocki wrote: > > > > While the current settings are probably unsafe (like enabling PCI devices > > to be suspended asynchronously by default if there are not any direct > > dependences between them), there are provisions to make eveything safe, if > > we have enough information (which also is needed to put the required logic into > > the drivers). > > I disagree. > > Think of a situation that we already handle pretty poorly: USB mass > storage devices over a suspend/resume. > > > The device tree represents a good deal of the dependences > > between devices and the other dependences may be represented as PM links > > enforcing specific ordering of the PM callbacks. > > The device tree means nothing at all, because it may need to be entirely > rebuilt at resume time. With that assumption we have no choice but to leave the async stuff to the drivers, which generally I'm fine with, although I really don't expect to see it done. > Optimally, what we _should_ be doing (and aren't) for suspend/resume of > USB is to just tear down the whole topology and rebuild it and re-connect > the things like mass storage devices. IOW, there would be no device tree > to describe the topology, because we're finding it anew. And it's one of > the things we _would_ want to do asynchronously with other things. I think you should tell that to the USB people, because they don't seem to think this way. [Side note, I do think that at least some information in the device tree will remain valid over suspend/resume, but this is a different matter.] > We don't want to build up some irrelevant PM links and callbacks. We don't > want to have some completely made-up new infrastructure for something that > we _already_ want to handle totally differently for init time. > > IOW, I argue very strongly against making up something PM-specific, when > there really doesn't seem to be much of an advantage. We're much better > off trying to share the init code than making up something new. > > > I'd say if there's a worry that the same register may be accessed concurrently > > from two different code paths, there should be some locking in place. > > Yeah. And I wish ACPI didn't exist at all. We don't know. > > And we want to _limit_ our exposure to these things. Don't worry, I'm not going to touch async suspend/resume again, unless somebody makes me do it. BTW, you seem to have some quite strong opinions about power management that you only share with people when somebody sends you patches you don't like. I guess it will be much more productive if we know your thoughts about it in advance, so I hope you won't mind being sent CCs of core PM patches posted to linux-pm for discussions. Thanks, Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-06 2:05 ` Linus Torvalds 2009-12-06 2:36 ` Rafael J. Wysocki @ 2009-12-06 15:23 ` Alan Stern 2009-12-06 19:04 ` [linux-pm] " Victor Lowther ` (2 more replies) 2009-12-06 19:35 ` Arjan van de Ven 2 siblings, 3 replies; 235+ messages in thread From: Alan Stern @ 2009-12-06 15:23 UTC (permalink / raw) To: Linus Torvalds; +Cc: Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list On Sat, 5 Dec 2009, Linus Torvalds wrote: > Think of a situation that we already handle pretty poorly: USB mass > storage devices over a suspend/resume. > > > The device tree represents a good deal of the dependences > > between devices and the other dependences may be represented as PM links > > enforcing specific ordering of the PM callbacks. > > The device tree means nothing at all, because it may need to be entirely > rebuilt at resume time. Nonsense. > Optimally, what we _should_ be doing (and aren't) for suspend/resume of > USB is to just tear down the whole topology and rebuild it and re-connect > the things like mass storage devices. IOW, there would be no device tree > to describe the topology, because we're finding it anew. And it's one of > the things we _would_ want to do asynchronously with other things. That's ridiculous. Having gone to all the trouble of building a device tree, one which is presumably still almost entirely correct, why go to all the trouble of tearing it down only to rebuild it again? (Note: I'm talking about resume-from-RAM here, not resume-from-hibernation.) Instead what we do is verify that the devices we remember from before the suspend are still there, and then asynchronously handle new devices which have been plugged in during the meantime. Doing this involves relatively little extra or new code; most of the routines are shared with the runtime PM and device reset paths. As for asynchronicity... At init time, USB device discovery truly is asynchronous. It can happen long after you log in (especially if you don't plug in the device until after you log in!). But at resume time we are more highly constrained. User processes cannot be unfrozen until all the devices have been resumed; otherwise they would encounter errors when trying to do I/O to a suspended device. (With the runtime PM framework this is much less of a problem, but plenty of drivers don't support runtime PM yet.) > We don't want to build up some irrelevant PM links and callbacks. We don't > want to have some completely made-up new infrastructure for something that > we _already_ want to handle totally differently for init time. > > IOW, I argue very strongly against making up something PM-specific, when > there really doesn't seem to be much of an advantage. We're much better > off trying to share the init code than making up something new. If I understand correctly, what you're suggesting is impractical. You would have each driver responsible for resuming the devices it registers. If it registered some children synchronously (during the parent's probe) then it would resume them synchronously (during the parent's resume); if it registered them asynchronously then it would resume them asynchronously. In essence, every single device_add() or device_register() call would have to be paired with a resume call. To make such significant changes in every driver would be prohibitively difficult. What we need is a compromise which gives drivers control over the resume process without making them responsible for actually carrying it out. So consider this suggestion: Let's define PM groups. Each device belongs to a group, and each group (except group 0, the initial group) has an owner device. By default a device is added to its parent's group during registration, but the driver can request that it be assigned to a different group, which must be owned by that parent. During resume, each PM group would correspond to an async task. The devices in each group would be resumed sequentially, in order of registration, but asynchronously with respect to other groups. The async thread to resume a group would be launched after the group's owner device was resumed. So for example, the sibling functions on a PCI card could all be assigned to the same group, but different cards could belong to different groups. Likewise for ATA and PCMCIA controllers. Extra cross-group constraints could be added if needed, but there should be relatively few of them. This way drivers can decide which of their devices will be resumed in sequence or concurrently, but they won't have to do any of the necessary work. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [linux-pm] [GIT PULL] PM updates for 2.6.33 2009-12-06 15:23 ` Alan Stern @ 2009-12-06 19:04 ` Victor Lowther 2009-12-07 3:57 ` Zhang Rui 2009-12-07 5:20 ` Linus Torvalds 2 siblings, 0 replies; 235+ messages in thread From: Victor Lowther @ 2009-12-06 19:04 UTC (permalink / raw) To: Alan Stern; +Cc: ACPI Devel Maling List, pm list, LKML On Sun, 2009-12-06 at 10:23 -0500, Alan Stern wrote: > On Sat, 5 Dec 2009, Linus Torvalds wrote: > That's ridiculous. Having gone to all the trouble of building a device > tree, one which is presumably still almost entirely correct, why go to > all the trouble of tearing it down only to rebuild it again? (Note: > I'm talking about resume-from-RAM here, not resume-from-hibernation.) There should be nothing special or privileged at all about the device tree that gets built at boot time. Consider the scenario of the laptop user with a docking station. Adding, removing, and rewriting vast swaths of the device tree across suspend/resume and hibernate/thaw is very easy to do when you are plugging a laptop into one or more docking stations. > Instead what we do is verify that the devices we remember from before > the suspend are still there, and then asynchronously handle new devices > which have been plugged in during the meantime. Doing this involves > relatively little extra or new code; most of the routines are shared > with the runtime PM and device reset paths. Devices can vanish across suspend to RAM just as easily as they can be added. > > _______________________________________________ > linux-pm mailing list > linux-pm@lists.linux-foundation.org > https://lists.linux-foundation.org/mailman/listinfo/linux-pm ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-06 15:23 ` Alan Stern 2009-12-06 19:04 ` [linux-pm] " Victor Lowther @ 2009-12-07 3:57 ` Zhang Rui 2009-12-07 5:57 ` Linus Torvalds 2009-12-07 5:20 ` Linus Torvalds 2 siblings, 1 reply; 235+ messages in thread From: Zhang Rui @ 2009-12-07 3:57 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list On Sun, 2009-12-06 at 23:23 +0800, Alan Stern wrote: > On Sat, 5 Dec 2009, Linus Torvalds wrote: > > > Think of a situation that we already handle pretty poorly: USB mass > > storage devices over a suspend/resume. > > > > > The device tree represents a good deal of the dependences > > > between devices and the other dependences may be represented as PM links > > > enforcing specific ordering of the PM callbacks. > > > > The device tree means nothing at all, because it may need to be entirely > > rebuilt at resume time. > > Nonsense. > > > Optimally, what we _should_ be doing (and aren't) for suspend/resume of > > USB is to just tear down the whole topology and rebuild it and re-connect > > the things like mass storage devices. IOW, there would be no device tree > > to describe the topology, because we're finding it anew. And it's one of > > the things we _would_ want to do asynchronously with other things. > > That's ridiculous. Having gone to all the trouble of building a device > tree, one which is presumably still almost entirely correct, why go to > all the trouble of tearing it down only to rebuild it again? (Note: > I'm talking about resume-from-RAM here, not resume-from-hibernation.) > > Instead what we do is verify that the devices we remember from before > the suspend are still there, and then asynchronously handle new devices > which have been plugged in during the meantime. Doing this involves > relatively little extra or new code; most of the routines are shared > with the runtime PM and device reset paths. > > As for asynchronicity... At init time, USB device discovery truly is > asynchronous. It can happen long after you log in (especially if you > don't plug in the device until after you log in!). But at resume time > we are more highly constrained. User processes cannot be unfrozen > until all the devices have been resumed; otherwise they would encounter > errors when trying to do I/O to a suspended device. (With the runtime > PM framework this is much less of a problem, but plenty of drivers > don't support runtime PM yet.) > > > > We don't want to build up some irrelevant PM links and callbacks. We don't > > want to have some completely made-up new infrastructure for something that > > we _already_ want to handle totally differently for init time. > > > > IOW, I argue very strongly against making up something PM-specific, when > > there really doesn't seem to be much of an advantage. We're much better > > off trying to share the init code than making up something new. > > If I understand correctly, what you're suggesting is impractical. You > would have each driver responsible for resuming the devices it > registers. If it registered some children synchronously (during the > parent's probe) then it would resume them synchronously (during the > parent's resume); if it registered them asynchronously then it would > resume them asynchronously. In essence, every single device_add() or > device_register() call would have to be paired with a resume call. > > To make such significant changes in every driver would be prohibitively > difficult. What we need is a compromise which gives drivers control > over the resume process without making them responsible for actually > carrying it out. > > So consider this suggestion: Let's define PM groups. Each device > belongs to a group, and each group (except group 0, the initial group) > has an owner device. By default a device is added to its parent's > group during registration, but the driver can request that it be > assigned to a different group, which must be owned by that parent. > > During resume, each PM group would correspond to an async task. The > devices in each group would be resumed sequentially, in order of > registration, but asynchronously with respect to other groups. The > async thread to resume a group would be launched after the group's > owner device was resumed. > yes, we've talked about something similar to this before. :) Hi, Linus, can you please look at this patch set and see if the idea is right? http://marc.info/?l=linux-kernel&m=124840449826386&w=2 http://marc.info/?l=linux-acpi&m=124840456826456&w=2 http://marc.info/?l=linux-acpi&m=124840456926459&w=2 http://marc.info/?l=linux-acpi&m=124840457026468&w=2 http://marc.info/?l=linux-acpi&m=124840457126471&w=2 If yes, I'll pick them up again and rework a patch set, including some good thoughts from Rafael. thanks, rui > So for example, the sibling functions on a PCI card could all be > assigned to the same group, but different cards could belong to > different groups. Likewise for ATA and PCMCIA controllers. Extra > cross-group constraints could be added if needed, but there should be > relatively few of them. > > This way drivers can decide which of their devices will be resumed in > sequence or concurrently, but they won't have to do any of the > necessary work. > > Alan Stern > > -- > To unsubscribe from this list: send the line "unsubscribe linux-acpi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 3:57 ` Zhang Rui @ 2009-12-07 5:57 ` Linus Torvalds 2009-12-07 6:15 ` Linus Torvalds ` (3 more replies) 0 siblings, 4 replies; 235+ messages in thread From: Linus Torvalds @ 2009-12-07 5:57 UTC (permalink / raw) To: Zhang Rui Cc: Alan Stern, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list On Mon, 7 Dec 2009, Zhang Rui wrote: > > Hi, Linus, > can you please look at this patch set and see if the idea is right? > http://marc.info/?l=linux-kernel&m=124840449826386&w=2 > http://marc.info/?l=linux-acpi&m=124840456826456&w=2 > http://marc.info/?l=linux-acpi&m=124840456926459&w=2 > http://marc.info/?l=linux-acpi&m=124840457026468&w=2 > http://marc.info/?l=linux-acpi&m=124840457126471&w=2 So I'm not entirely sure about that patch-set, but the thing I like about it is how drivers really sign up to it one by one, rather than having all PCI devices automatically signed up for async behavior. That said, the thing I don't like about it is some of the same thing I don't necessarily like about the series in Rafael's tree either: it looks rather over-designed with the whole infrastructure for async device logic (your patch in http://marc.info/?l=linux-acpi&m=124840456926459&w=2). How would you explain that whole async_dev_register() logic in simple terms to somebody else? (I think yours is simpler that the one in the PM tree, but I dunno. I've not really compared the two). So let me explain my dislike by trying to outline some conceptually simple thing that doesn't have any call-backs, doesn't have any "classes", doesn't require registration etc. It just allows drivers at any level to decide to do some things (not necessarily everything) asynchronously. Here's the outline: - first off: drivers that don't know that they nest clearly don't do anything asynchronous. No "PCI devices can be done in parallel" crap, because they really can't - not in the general case. So just forget about that kind of logic entirely: it's just wrong. - the 'suspend' thing is a depth-first tree walk. As we suspend a node, we first suspend the child nodes, and then we suspend the node itself. Everybody agrees about that, right? - Trivial "async rule": the tree is walked synchronously, but as we walk it, any point in the tree may decide to do some or all of its suspend asynchronously. For example, when we hit a disk node, the disk driver may just decide that (a) it knows that the disk is an independent thing and (b) it's hierarchical wrt it's parent so (c) it can do the disk suspend asynchronously. - To protect against a parent node being suspended before any async child work has completed, the child suspend - before it kicks off the actual async work - just needs to take a read-lock on the parent (read-lock, because you may have multiple children sharing a parent, and they don't lock each other out). Then the only thing the asynchronous code needs to do is to release the read lock when it is done. - Now, the rule just becomes that the parent has to take a write lock on itself when it suspends itself. That will automatically block until all children are done. Doesn't the above sound _simple_? Doesn't that sound like it should just obviously do the right thing? It sounds like something you can explain as a locking rule without having any complex semantic registration or anything at all. Now, the problem remains that when you walk the device tree starting off all these potentially asynchronous events, you don't want to do that serialization part (the "parent suspend") as you walk the tree - because then you would only ever do one single level asynchronously. Which is why I suggested splitting the suspend into a "pre-suspend" phase (and a "post-resume" one). Because then the tree walk goes from # single depth-first thing suspend(root) { for_each_child(root) { // This may take the parent lock for // reading if it does something async suspend(child); } // This serializes with any async children write_lock(root->lock); suspend_one_node(root); write_unlock(root->lock); } to # Phase one: walk the tree synchronously, starting any # async work on the leaves suspend_prepare(root) { for_each_child(root) { // This may take the parent lock for // reading if it does something async suspend_prepare(child); } suspend_prepare_one_node(root); } # Phase two: walk the tree synchronously, waiting for # and finishing the suspend suspend(root) { for_each_child(root) { suspend(child); } // This serializes with any async children started in phase 1 write_lock(root->lock); suspend_one_node(root); write_unlock(root->lock); } and I really think this should work. The advantage: untouched drivers don't change ANY SEMANTICS AT ALL. If they don't have a 'suspend_prepare()' function, then they still see that exact same sequence of 'suspend()' calls. In fact, even if they have children that _do_ have drivers that have that async phase, they'll never know, because that simple write-semaphore trivially guarantees that whether there was async work or not, it will be completed by the time we call 'suspend()'. And drivers that want to do things asynchronously don't need to register or worry: all they do is literally - move their 'suspend()' function to 'suspend_prepare()' instead - add a down_read(dev->parent->lock); async_run(mysuspend, dev); to the point that they want to be asynchronous (which may be _all_ of it or just some slow part). The 'mysuspend' part would be the async part. - add a up_read(dev->parent->lock); to the end of their asynchronous 'mysuspend()' function, so that when the child has finished suspending, the parent down_write() will finally succeed. Doesn't that all sound pretty simple? And it has a very clear architecture that is pretty easy to explain to anybody who knows about traversing trees depth-first. No complex concepts. No change to existing tested drivers. No callbacks, no flags, no nothing. And a pretty simple way for a driver to decide: I'll do my suspends asynchronously (without parent drivers really ever even having to know about it). I dunno. Maybe I'm overlooking something, but the above is much closer to what I think would be worth doing. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 5:57 ` Linus Torvalds @ 2009-12-07 6:15 ` Linus Torvalds 2009-12-17 23:28 ` Benjamin Herrenschmidt 2009-12-07 6:37 ` Arjan van de Ven ` (2 subsequent siblings) 3 siblings, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-07 6:15 UTC (permalink / raw) To: Zhang Rui Cc: Alan Stern, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list, Arjan van de Ven On Sun, 6 Dec 2009, Linus Torvalds wrote: > > And drivers that want to do things asynchronously don't need to register > or worry: all they do is literally [...] Side note: for specific bus implementations, you obviously don't have to even expose the choice. Things like the whole "suspend_late" and "resume_early" phases don't make sense for USB devices, and the USB core layer don't even expose those to the various USB drivers. The same is true of the prepare_suspend/suspend split I'm proposing: I suspect that for something like USB, it would make most sense to just do normal node suspend in prepare_suspend, which would do everything asynchronously. Only USB hub devices would get involved at the later 'suspend()' phase. So I'm not suggesting that "all drivers" would necessarily even need changing in order to take advantage of asynchronous behavior. You could change just the _core_ USB layer would do everything automatically for USB devices, and now USB devices would automatically suspend asynchronously not because the generic device layer knows about it, but because the USB bus layer chose to do that "async_run()" on the leaf node suspend functions (or rather: a helper function that calls the leaf-node suspend, and then does the 'up_read()' call on the parent lock: the actual usb driverrs would never know about any of this). Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 6:15 ` Linus Torvalds @ 2009-12-17 23:28 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 235+ messages in thread From: Benjamin Herrenschmidt @ 2009-12-17 23:28 UTC (permalink / raw) To: Linus Torvalds Cc: Zhang Rui, Alan Stern, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list, Arjan van de Ven On Sun, 2009-12-06 at 22:15 -0800, Linus Torvalds wrote: > > The same is true of the prepare_suspend/suspend split I'm proposing: > I > suspect that for something like USB, it would make most sense to just > do > normal node suspend in prepare_suspend, which would do everything > asynchronously. Only USB hub devices would get involved at the later > 'suspend()' phase. Wasn't part of the goal with prepare_suspend() vs. suspend() to handle the problem of backing store vs the VM ? IE. Once any device potentially in the VM path is suspended, things like kmalloc() or gfp() can potentially stall until resume or did we address that recently ? Iirc, part of the idea behind prepare_* is that it's safe vs. the above. Now if you start suspending USB devices at prepare() then you break that assumption since those could be mass storage with dirty mmap'ed pages on them. Now, I'm all for fixing it at the VM/allocator level (if we didn't already) turning pretty much everything into NO_IO once we start suspending devices but that's a whole different matter :-) Cheers, Ben. ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 5:57 ` Linus Torvalds 2009-12-07 6:15 ` Linus Torvalds @ 2009-12-07 6:37 ` Arjan van de Ven 2009-12-07 15:13 ` Alan Stern 2009-12-07 15:15 ` [GIT PULL] PM updates for 2.6.33 Rafael J. Wysocki 3 siblings, 0 replies; 235+ messages in thread From: Arjan van de Ven @ 2009-12-07 6:37 UTC (permalink / raw) To: Linus Torvalds Cc: Zhang Rui, Alan Stern, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list On Sun, 6 Dec 2009 21:57:55 -0800 (PST) Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Now, the problem remains that when you walk the device tree starting > off all these potentially asynchronous events, you don't want to do > that serialization part (the "parent suspend") as you walk the tree - > because then you would only ever do one single level asynchronously. > Which is why I suggested splitting the suspend into a "pre-suspend" > phase (and a "post-resume" one). Because then the tree walk goes from > I dunno. Maybe I'm overlooking something, but the above is much > closer to what I think would be worth doing. with what you're describing I suspect the current async function calls could be used; in the first tree walk, the drivers do an async_schedule() of the things they want done asynchronous; all the core then needs to do is a full synchronization step between the two tree walks... and we get pretty much all the benefits without needing the read-then-write-lock primitive for synchronization. alternative would be to do the synchronization in the part where we know there's a dependency (like your lock is doing); but instead of a lock we could store the async cookie there; and just wait on that in the 2nd phase.... this would be more finegrained, and an optimization from the "global synchronize"... but I'm not sure it'll be worth it in practice; it will if there's significant cost in various parts of the tree AND in the 2nd run; if the 2nd run is cheap in general, you're not going to get real extra parallelism at the price of more complexity. -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 5:57 ` Linus Torvalds 2009-12-07 6:15 ` Linus Torvalds 2009-12-07 6:37 ` Arjan van de Ven @ 2009-12-07 15:13 ` Alan Stern 2009-12-07 16:31 ` Linus Torvalds 2009-12-07 15:15 ` [GIT PULL] PM updates for 2.6.33 Rafael J. Wysocki 3 siblings, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-07 15:13 UTC (permalink / raw) To: Linus Torvalds Cc: Zhang Rui, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list On Sun, 6 Dec 2009, Linus Torvalds wrote: > # Phase one: walk the tree synchronously, starting any > # async work on the leaves > suspend_prepare(root) > { > for_each_child(root) { > // This may take the parent lock for > // reading if it does something async > suspend_prepare(child); > } > suspend_prepare_one_node(root); > } > > # Phase two: walk the tree synchronously, waiting for > # and finishing the suspend > suspend(root) > { > for_each_child(root) { > suspend(child); > } > // This serializes with any async children started in phase 1 > write_lock(root->lock); > suspend_one_node(root); > write_unlock(root->lock); > } > > and I really think this should work. > No complex concepts. No change to existing tested drivers. No callbacks, > no flags, no nothing. And a pretty simple way for a driver to decide: I'll > do my suspends asynchronously (without parent drivers really ever even > having to know about it). > > I dunno. Maybe I'm overlooking something, but the above is much closer to > what I think would be worth doing. You're overlooking resume. It's more difficult than suspend. The issue is that a child can't start its async part until the parent's synchronous part is finished. So for example, suppose the device listing contains P, C, Q, where C is a child of P, Q is unrelated, and P has a long-lasting asynchronous requirement. The resume process will stall upon reaching C, waiting for P to finish. Thus even though P and Q might be able to resume in parallel, they won't get the chance. An approach that handles resume well can probably be adapted to handle suspend too. The reverse isn't true, as this example shows. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 15:13 ` Alan Stern @ 2009-12-07 16:31 ` Linus Torvalds 2009-12-07 16:55 ` Linus Torvalds 2009-12-07 17:52 ` Alan Stern 0 siblings, 2 replies; 235+ messages in thread From: Linus Torvalds @ 2009-12-07 16:31 UTC (permalink / raw) To: Alan Stern Cc: Zhang Rui, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list On Mon, 7 Dec 2009, Alan Stern wrote: > > > > I dunno. Maybe I'm overlooking something, but the above is much closer to > > what I think would be worth doing. > > You're overlooking resume. It's more difficult than suspend. The > issue is that a child can't start its async part until the parent's > synchronous part is finished. No, I haven't overlooked resume at all. I just assumed that it was obvious. It's the exact same thing, except in reverse (the locking ends up being slightly different, but the changes are actually fairly straightforward). And by reverse, I mean that you walk the tree in the reverse order too, exactly like we already do - on suspend we walk it children-first, on resume we walk it parents-first (small detail: we actually just walk a simple linked list, but the list is topologically ordered, so walking it forwards/backwards is topologically the same thing as doing that depth-first search). > So for example, suppose the device listing contains P, C, Q, where C is > a child of P, Q is unrelated, and P has a long-lasting asynchronous > requirement. The resume process will stall upon reaching C, waiting > for P to finish. Thus even though P and Q might be able to resume in > parallel, they won't get the chance. No. The resume process does EXCTLY THE SAME THING as I outlined for suspend, but just all in reverse. So now the resume process becomes the same two-phase thing: # Phase one resume(root) { // This can do things asynchronously if it wants, // but needs to take the write lock on itself until // it is done if it does resume_one_node(root); for_each_child(root) resume(child); } # Phase two post_resume(root) { post_resume_one_node(root); for_each_child(root) post_resume(child); } Notice? It's _exactly_ the same thing as suspend - except all turned around. We do the nodes before the children ("walk the list backwards"), and we also do the locking the other way around (ie on suspend we'd lock the _parent_ if we wanted to do async stuff - to keep it around - but on resume we lock _ourselves_, so that the children can have something to wait on. Also note how we take a _write_ lock rather than a read lock). (And again, I've only written it out in email, I've not tested it or thought about it all that deeply, so you'll excuse any stupid thinkos.) Now, for something like PCI, I'd suggest (once more) leaving all drivers totally unchanged, and you end up with the exact same behavior as we had before (no real change to the whole resume ordering, and everything is synchronous so there is no relevant locking). But how would the USB layer do this? Simple: all the normal leaf devices would have their resume callback be called at "post_resume()" time (exactly the reverse of the suspend phase: we suspend early, and we resume late - it's all a mirror image). And I'd suggest that the USB layer do it all totally asynchronously, except again turned around the other way. Remember how on suspend, the suspend of a leaf device ended up being an issue of asynchronously calling a function that did the suspend, and then released the read-lock of the parent. Resume is the same, except now we'd actually want to take the parent read-lock asynchronously too, so you'd do down_write(leaf->lock); async_schedule(usb_node_resume, leaf); where that function simply does usb_node_resume(node) { /* Wait for the parent to have resumed completely */ down_read(node->parent->lock); node->resume(node) up_read(node->parent->lock); up_write(node->lock); } and you're all done. Once more the ordering and the locking takes care of any need to serialize - there is no data structures to keep track of. And what about USB hubs? They get resumed in the first phase (again, exactly the mirror image of the suspend), and the only thing they need to do is that _exact_ same thing above: down_write(hub->lock); async_schedule(usb_node_resume, hub); - Ta-daa! All done. Notice? It's really pretty straightforward, and there are _zero_ new concepts. And again, no callbacks, no nothing. Just the obvious mirror image of what happened when suspending. We do everything with simple async calls. And none of the tree walking actually blocks (yes, we do a "down_write()" on the nodes as we schedule the resume code, but it won't be a blocking one, since that is the first time we encounter that node: the blocking will come later when the async threads actually need to wait for things). Again, I do not guarantee that I've dotted every i, and crossed every t. It's just that I'm pretty sure that we really don't need any fancy "infrastructure" for something this simple. And I really much prefer "conceptually simple high-level model" over a model of "keep track of all the relationships and have some complex model of devices". So let's just look at your example: > So for example, suppose the device listing contains P, C, Q, where C is > a child of P, Q is unrelated, and P has a long-lasting asynchronous > requirement. The tree is: ... -> P -> C -> Q and with what I suggest, during phase one, P will asynchronously start the resume. As part of its async resume it will have to wait for it's parents, of course, but all of that happens in a separate context, and the tree traversal goes on. And during phase #1, C and Q won't do anything at all. We _could_ do them during this phase, and it would actually all work out fine, but we wouldn't want to do that for a simple reason: we _want_ the pre_suspend and post_resume phases to be total mirror images, because if we end up doing error handling for the pre-suspend case, then the post-resume phase would be the "fixup" for it, so we actually want leaf things to happen during phase #2 - not because it would screw up locking or ordering, but because of other issues. When we hit phase #2, we then do C and Q, and do the same thing - we have an async call that does the read-lock on the parent to make sure it's all resumed, and then we resume C and Q. And they'll automatically resume in parallel (unless C is waiting for P, of course, in which case P and Q end up resuming in parallel, and C ends up waiting). Now, the above just takes care of the inter-device ordering. There are unrelated semantics we want to give, like "all devices will have resumed before we start waking up user space". Those are unrelated to the topological requirements, of course, and are not a requirement imposed by the device tree, but by our _other_ semantics (IOW, in this respect it's kind of like how we wanted pre-suspend and post-resume to be mirror images for other outside reasons). So we'd actually have a "phase #3", but that phase wouldn't be visible to the devices themselves, it would be a # Phase tree: make sure everything is resumed for_each_device() { read_lock(dev->lock); read_unlock(dev->lock); } but as you can see, there's no actual device callbacks involved. It would be just the code device layer saying "ok, now I'm going to wait for all the devices to have finished their resume". Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 16:31 ` Linus Torvalds @ 2009-12-07 16:55 ` Linus Torvalds 2009-12-07 17:52 ` Alan Stern 1 sibling, 0 replies; 235+ messages in thread From: Linus Torvalds @ 2009-12-07 16:55 UTC (permalink / raw) To: Alan Stern Cc: Zhang Rui, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list On Mon, 7 Dec 2009, Linus Torvalds wrote: > > And during phase #1, C and Q won't do anything at all. We _could_ do them > during this phase, and it would actually all work out fine, but we > wouldn't want to do that for a simple reason: we _want_ the pre_suspend > and post_resume phases to be total mirror images, because if we end up > doing error handling for the pre-suspend case, then the post-resume phase > would be the "fixup" for it, so we actually want leaf things to happen > during phase #2 - not because it would screw up locking or ordering, but > because of other issues. Ho humm. This part made me think. Since I started mulling over the fact that we could do the resume thing in a single phase (and really only wanted the second phase in order to be a mirror image to the suspend), I started thinking that we could perhaps do even the suspend with a single phase, and avoid introducing that pre-suspend/post-resume phase at all. And now that I think about it, we can do that by simply changing the locking just a tiny bit. I originally envisioned that two-pase suspend because I was thinking that the first phase would start off the suspend, and the second phase would finish it, but we can actually do it all with a single phase that does both. So starting with just the regular depth-first post-ordering that is a suspend: suspend(root) { for_each_child(root) suspend(child); suspend_one_node(root) } the rule would be that for something like USB that wants to do the suspend asynchronously, the node suspend routine would do usb_node_suspend(node) { // Make sure parent doesn't suspend: this will not block, // because we'll call the 'suspend' function for all nodes // before we call it for the parent. down_read(node->parent->lock); // Do the part that may block asynchronously async_schedule(do_usb_node_suspend, node); } do_usb_node_suspend(node) { // Start out suspend. This will block if we have any // children that are still busy suspending (they will // have done a down_read() in their suspend). down_write(node->lock); node->suspend(node); up_write(node->lock); // This lets our parent continue up_read(node->parent->lock); } and it looks like we don't even need a second phase at all. IOW, I think USB could do this on its own right now, with no extra infrastructure from the device layer AT ALL, except for one small thing: that new "rwsem" lock in the device data structure, and then we'd need the "wait for everybody to have completed" loop, ie for_each_dev(dev) { down_write(dev->lock); up_write(dev->lock); } thing at the end of the suspend loop (same thing as I mentioned about resuming). So I think even that whole two-phase thing was unnecessarily complicated. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 16:31 ` Linus Torvalds 2009-12-07 16:55 ` Linus Torvalds @ 2009-12-07 17:52 ` Alan Stern 2009-12-07 18:05 ` Linus Torvalds 1 sibling, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-07 17:52 UTC (permalink / raw) To: Linus Torvalds Cc: Zhang Rui, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list On Mon, 7 Dec 2009, Linus Torvalds wrote: > No, I haven't overlooked resume at all. I just assumed that it was > obvious. It's the exact same thing, except in reverse (the locking ends > up being slightly different, but the changes are actually fairly > straightforward). > > And by reverse, I mean that you walk the tree in the reverse order too, > exactly like we already do - on suspend we walk it children-first, on > resume we walk it parents-first (small detail: we actually just walk a > simple linked list, but the list is topologically ordered, so walking it > forwards/backwards is topologically the same thing as doing that > depth-first search). > Notice? It's _exactly_ the same thing as suspend - except all turned > around. We do the nodes before the children ("walk the list backwards"), > and we also do the locking the other way around (ie on suspend we'd lock > the _parent_ if we wanted to do async stuff - to keep it around - but on > resume we lock _ourselves_, so that the children can have something to > wait on. Also note how we take a _write_ lock rather than a read lock). Okay, I think I've got it. But you're wrong about one thing: Resume isn't _exactly_ the reverse of suspend. For both of them we have to start the async thread in the first pass. So instead of resume/post_resume we would have pre_resume/resume, just like pre_suspend/suspend. During the pre- pass, the driver launches an async thread and takes the appropriate locks. The thread does its work as appropriate (with locking to insure that it first waits for children or parents), and then in the second pass the driver waits for the async thread to finish. A non-async driver (i.e., most of them) would ignore the pre- pass entirely and do all its work in the second pass. An async-aware driver would look like this: pre_suspend(dev) { /* Prevent parent from suspending until we are ready */ down_read(dev->parent->lock); dev->pm_cookie = async_schedule(async_suspend, dev); } async_suspend(dev) { /* Wait until all children are fully suspended */ down_write(dev->lock); Suspend dev, taking as much time as needed up_write(dev->lock); /* Allow parent to suspend */ up_read(dev->parent->lock); } suspend(dev) { /* Wait until the suspend is complete */ async_synchronize_cookie(dev->pm_cookie); } pre_resume(dev) { /* Prevent children from resuming */ down_write(dev->lock); dev->pm_cookie = async_schedule(async_resume, dev); } async_resume(dev) { /* Wait until parent is fully resumed */ down_read(dev->parent->lock); Resume dev, taking as much time as needed up_read(dev->parent->lock); /* Allow children to resume */ up_write(dev->lock); } resume(dev) { /* Wait until resume is complete */ async_synchronize_cookie(dev->pm_cookie); } So there's some time symmetry here, but it isn't perfect. This is probably what you had in mind all along, but I needed to get it straight. There's some question about what to do if a suspend or resume fails. A bunch of async threads will have been launched for other devices, but now there won't be anything to wait for them. It's not clear how this should be handled. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 17:52 ` Alan Stern @ 2009-12-07 18:05 ` Linus Torvalds 2009-12-07 20:37 ` Alan Stern 0 siblings, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-07 18:05 UTC (permalink / raw) To: Alan Stern Cc: Zhang Rui, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list On Mon, 7 Dec 2009, Alan Stern wrote: > > Okay, I think I've got it. But you're wrong about one thing: Resume > isn't _exactly_ the reverse of suspend. Yeah, no. But I think I made it much closer by getting rid of pre-suspend and post-resume (my next email to the one you quoted). And yeah, I started thinking along those lines exactly because it wasn't as clean a mirror image as I thought it should be able to be. > A non-async driver (i.e., most of them) would ignore the pre- pass > entirely and do all its work in the second pass. See my second email, where I think I can get rid of the whole second pass thing. I think you'll agree that it's an even nicer mirror image. > There's some question about what to do if a suspend or resume fails. A > bunch of async threads will have been launched for other devices, but > now there won't be anything to wait for them. It's not clear how this > should be handled. I think the rule for "suspend fails" is very simple: you can't fail in the async codepath. There's no sane way to return errors, and trying to would be too complex anyway. What would you do? In fact, even though we _can_ fail in the synchronous path, I personally consider a device driver that ever fails its suspend to be terminally broken. We're practically always better off suspending and simply turning off the power than saying "uh, I failed the suspend". I've occasionally hit a few drivers that caused suspend failures, and each and every time it was a driver bug, and the right thing to do was to just ignore the error and suspend anyway - returning an error code and trying to undo the suspend is not what anybody ever really wants, even if our model _allows_ for it. (And the rule for "resume fails" is even simpler: there's nothing we can really do if something fails to resume - and that's true whether the failure is synchronous or asynchronous. The device is dead. Try to reset it, or remove it from the device tree. Tough). Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 18:05 ` Linus Torvalds @ 2009-12-07 20:37 ` Alan Stern 2009-12-07 20:48 ` Linus Torvalds 0 siblings, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-07 20:37 UTC (permalink / raw) To: Linus Torvalds Cc: Zhang Rui, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list On Mon, 7 Dec 2009, Linus Torvalds wrote: > See my second email, where I think I can get rid of the whole second pass > thing. I think you'll agree that it's an even nicer mirror image. Yes, I like this approach better and better. There is still a problem. In your code outlines, you have presented a classic depth-first (suspend) or depth-last (resume) tree algorithm. But that's not how the PM core works. Instead it maintains dpm_list, a list of all devices in order of registration. Suspends and resumes are carried out by iterating along this list, in the reverse and forward directions respectively. There are two advantages. The matter of stack usage, of course. But more importantly, this order of devices is guaranteed to work. For any device D, we _know_ that the system can function properly in circumstances where everything on dpm_list before D is active and everything after D is inactive -- because that's the state the system was in when D was registered. Any other order risks errors because of unknown dependencies. The consequence is that there's no way to hand off an entire subtree to an async thread. And as a result, your single-pass algorithm runs into the kind of "stall" problem I described before. (In theory we could convert over to a tree algorithm. IMO that would be nearly as dangerous as going to a full-fledged totally async scheme.) But all is not lost. We can still get what we want using a two-pass list algorithm, where one of the passes is contained within the PM core -- no extra callbacks are needed. Here's how suspend would work: dpm_suspend() /* Suspend all devices on dpm_list */ { list_for_each_entry_reverse(dev, dpm_list, ...) { /* Make the parent wait for dev */ down_read(dev->parent->lock); if (dev->async_pm) async_schedule(device_suspend, dev); } list_for_each_entry_reverse(dev, dpm_list, ...) { if (!dev->async_pm) device_suspend(dev); } async_synchronize_full(); } device_suspend(dev) /* Suspend a single device */ { /* Wait until all the children are suspended */ down_write(dev->lock); dev->bus->suspend(dev); up_write(dev->lock); /* Tell the parent we are finished */ up_read(dev->parent->lock); } I have glossed over a bunch of details, such as the fact that device_suspend() really takes two arguments. And it's necessary to be more careful with the list operations than shown here, because devices can be unregistered while all this is going on. Still, this seems reasonable. Bus subsystems and drivers can set the dev->async_pm flag as desired, and they can use the new rwsems to handle special dependencies without involving the PM core. No new callbacks are needed, nor any changes to existing methods. (Convincing lockdep that all this fancy footwork is valid may require some effort, though.) By the way, this bears a striking resemblance to Rafael's patch. The biggest difference is the use of the new rwsem for dependency resolution, instead his somewhat cumbersome constraint structures. > > There's some question about what to do if a suspend or resume fails. A > > bunch of async threads will have been launched for other devices, but > > now there won't be anything to wait for them. It's not clear how this > > should be handled. > > I think the rule for "suspend fails" is very simple: you can't fail in the > async codepath. There's no sane way to return errors, and trying to would > be too complex anyway. What would you do? You could prevent the suspend procedure from going any further and abort the entire system sleep. If you wanted to. > In fact, even though we _can_ fail in the synchronous path, I personally > consider a device driver that ever fails its suspend to be terminally > broken. We're practically always better off suspending and simply turning > off the power than saying "uh, I failed the suspend". > > I've occasionally hit a few drivers that caused suspend failures, and each > and every time it was a driver bug, and the right thing to do was to just > ignore the error and suspend anyway - returning an error code and trying > to undo the suspend is not what anybody ever really wants, even if our > model _allows_ for it. There is a valid reason for aborting a sleep transition: the driver has received a remote wakeup request. Wakeup requests race with sleep, of course. A request coming after the system is asleep will wake it up; one coming before the system is asleep should either cause it to wake up immediately after shutting down or prevent the sleep entirely. Causing the system to wake up immediately needs hardware support. But by the time the kernel is aware of a wakeup request, the request is generally no longer present in the hardware. (For example, an interrupt has been delivered and the IRQ line is no longer active.) So the only remaining choice is to abort the sleep transition. > (And the rule for "resume fails" is even simpler: there's nothing we can > really do if something fails to resume - and that's true whether the > failure is synchronous or asynchronous. The device is dead. Try to reset > it, or remove it from the device tree. Tough). Right. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 20:37 ` Alan Stern @ 2009-12-07 20:48 ` Linus Torvalds 2009-12-07 21:32 ` Alan Stern 0 siblings, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-07 20:48 UTC (permalink / raw) To: Alan Stern Cc: Zhang Rui, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list On Mon, 7 Dec 2009, Alan Stern wrote: > > Yes, I like this approach better and better. > > There is still a problem. In your code outlines, you have presented a > classic depth-first (suspend) or depth-last (resume) tree algorithm. Yes, I did that because that clarifies the locking rules (ie "we traverse parents nodes last/first"), not because it was actually relevant to anything else. And the whole pre-order vs post-order is important, and really only shows up when you show the pseudo-code as a tree walk. > But that's not how the PM core works. Instead it maintains dpm_list, a > list of all devices in order of registration. Right. I did mention that in a couple of the asides, I'm well aware that we don't actually traverse the tree as a tree. But the "traverse list forward" is logically the same thing as doing a pre-order DFS, while going backwards is equivalent to doing a post-order DFS, since all we really care about is the whole "parent first" or "children first" part of the ordering. So I wanted to show the logic in pseudo-code using the tree walk (because that explains the logic _conceptually_ much better), but the actual code would just do the list traversal. > The consequence is that there's no way to hand off an entire subtree to > an async thread. And as a result, your single-pass algorithm runs into > the kind of "stall" problem I described before. No, look again. There's no stall in the thing, because all it really depends on is (for the suspend path) is that it sees all children before the parent (because the child will do a "down_read()" on the parent node and that should not stall), and for the resume path it depends on seeing the parent node before any children (because the parent node does that "down_write()" on its own node). Everything else is _entirely_ asynchronous, including all the other locks it takes. So there are no stalls (except, of course, if we then hit limits on numbers of outstanding async work and refuse to create too many outstanding async things, but that's a separate issue, and intentional, of course). You're right that my first one (two-phase suspend) had a stall situation. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 20:48 ` Linus Torvalds @ 2009-12-07 21:32 ` Alan Stern 2009-12-07 21:41 ` Linus Torvalds 0 siblings, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-07 21:32 UTC (permalink / raw) To: Linus Torvalds Cc: Zhang Rui, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list On Mon, 7 Dec 2009, Linus Torvalds wrote: > > The consequence is that there's no way to hand off an entire subtree to > > an async thread. And as a result, your single-pass algorithm runs into > > the kind of "stall" problem I described before. > > No, look again. There's no stall in the thing, because all it really > depends on is (for the suspend path) is that it sees all children before > the parent (because the child will do a "down_read()" on the parent node > and that should not stall), and for the resume path it depends on seeing > the parent node before any children (because the parent node does that > "down_write()" on its own node). > > Everything else is _entirely_ asynchronous, including all the other locks > it takes. So there are no stalls (except, of course, if we then hit limits > on numbers of outstanding async work and refuse to create too many > outstanding async things, but that's a separate issue, and intentional, of > course). It only seems that way because you didn't take into account devices that suspend synchronously but whose children suspend asynchronously. A synchronous suspend routine for a device with async child suspends would have to look just like your usb_node_suspend(): suspend_one_node(dev) { /* Wait until the children are suspended */ down_write(dev->lock); Suspend dev up_write(dev->lock); /* Allow the parent to suspend */ up_read(dev->parent->lock); } So now suppose we've got two USB host controllers, A and B. They are PCI devices, so they suspend synchronously. Each has a root hub child (P and Q respectively) which is a USB device and therefore suspends asynchronously. dpm_list contains: A, P, B, Q. (In fact A doesn't enter into this discussion; you can ignore it.) In your one-pass algorithm, we start with usb_node_suspend(Q). It does down_read(B->lock) and starts an async task for Q. Then we move on to suspend_one_node(B). It does down_write(B->lock) and blocks until the async task finishes; then it suspends B. Finally we move on to usb_node_suspend(P), which does down_read(A->lock) and starts an async task for P. The upshot is that P is stuck waiting for Q to suspend, even though it should have been able to suspend in parallel. This is simply because P precedes B in the list, and B is synchronous and must wait for Q to finish. With my two-pass algorithm, we start with Q. The first loop does down_read(B->lock) and starts an async task for Q. We move on to B and do down_read(B->parent->lock), nothing more. Then we move to to P, with down_read(A->lock) and start an async task for P. Finally we do down_read(A->parent->lock). Notice that now there are two async tasks, for P and Q, running in parallel. The second pass waits for Q to finish before suspending B synchronously, and waits for P to finish before suspending A synchronously. This is unavoidable. The point is that it allows P and Q to suspend at the same time, not one after the other as in the one-pass scheme. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 21:32 ` Alan Stern @ 2009-12-07 21:41 ` Linus Torvalds 2009-12-07 21:47 ` Rafael J. Wysocki ` (2 more replies) 0 siblings, 3 replies; 235+ messages in thread From: Linus Torvalds @ 2009-12-07 21:41 UTC (permalink / raw) To: Alan Stern Cc: Zhang Rui, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list On Mon, 7 Dec 2009, Alan Stern wrote: > > It only seems that way because you didn't take into account devices > that suspend synchronously but whose children suspend asynchronously. But why would I care? If somebody suspends synchronously, then that's what he wants. > A synchronous suspend routine for a device with async child suspends > would have to look just like your usb_node_suspend(): Sure. But that sounds like a "Doctor, it hurts when I do this" situation. Don't do that. Make the USB host controller do its suspend asynchronously. We don't suspend PCI bridges anyway, iirc (but I didn't actually check). And at worst, we can make the PCI _bridges_ know about async suspends, and solve it that way - without actually making any normal PCI drivers do it. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 21:41 ` Linus Torvalds @ 2009-12-07 21:47 ` Rafael J. Wysocki 2009-12-07 22:01 ` Alan Stern 2009-12-07 22:02 ` Rafael J. Wysocki 2 siblings, 0 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-07 21:47 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Monday 07 December 2009, Linus Torvalds wrote: > > On Mon, 7 Dec 2009, Alan Stern wrote: > > > > It only seems that way because you didn't take into account devices > > that suspend synchronously but whose children suspend asynchronously. > > But why would I care? If somebody suspends synchronously, then that's what > he wants. > > > A synchronous suspend routine for a device with async child suspends > > would have to look just like your usb_node_suspend(): > > Sure. But that sounds like a "Doctor, it hurts when I do this" situation. > Don't do that. > > Make the USB host controller do its suspend asynchronously. We don't > suspend PCI bridges anyway, iirc (but I didn't actually check). That's correct, we don't. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 21:41 ` Linus Torvalds 2009-12-07 21:47 ` Rafael J. Wysocki @ 2009-12-07 22:01 ` Alan Stern 2009-12-07 22:06 ` Linus Torvalds 2009-12-07 22:02 ` Rafael J. Wysocki 2 siblings, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-07 22:01 UTC (permalink / raw) To: Linus Torvalds Cc: Zhang Rui, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list On Mon, 7 Dec 2009, Linus Torvalds wrote: > On Mon, 7 Dec 2009, Alan Stern wrote: > > > > It only seems that way because you didn't take into account devices > > that suspend synchronously but whose children suspend asynchronously. > > But why would I care? If somebody suspends synchronously, then that's what > he wants. It doesn't mean he wants to block unrelated devices from suspending asynchronously, merely because they happen to come earlier in the list. > > A synchronous suspend routine for a device with async child suspends > > would have to look just like your usb_node_suspend(): > > Sure. But that sounds like a "Doctor, it hurts when I do this" situation. > Don't do that. > > Make the USB host controller do its suspend asynchronously. We don't > suspend PCI bridges anyway, iirc (but I didn't actually check). And at > worst, we can make the PCI _bridges_ know about async suspends, and solve > it that way - without actually making any normal PCI drivers do it. This sounds suspiciously like pushing the problem up a level and hoping it will go away. (Sometimes that even works.) In the end it isn't a very big issue. Using one vs. two passes in dpm_suspend() is pretty unimportant. Alan Stern P.S.: In fact I planned all along to handle USB host controllers asynchronously anyway, since their resume routines contain some long delays. I was merely using them as an example. ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 22:01 ` Alan Stern @ 2009-12-07 22:06 ` Linus Torvalds 2009-12-07 22:21 ` Alan Stern 0 siblings, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-07 22:06 UTC (permalink / raw) To: Alan Stern Cc: Zhang Rui, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list On Mon, 7 Dec 2009, Alan Stern wrote: > > > > Make the USB host controller do its suspend asynchronously. We don't > > suspend PCI bridges anyway, iirc (but I didn't actually check). And at > > worst, we can make the PCI _bridges_ know about async suspends, and solve > > it that way - without actually making any normal PCI drivers do it. > > This sounds suspiciously like pushing the problem up a level and > hoping it will go away. (Sometimes that even works.) The "we don't suspend bridges anyway" is definitely a "hoping it will go away" issue. I think we did suspend bridges for a short while during the PM switch-over some time ago, and it worked most of the time, and then on some machines it just didn't work at all. Probably because ACPI ends up touching registers behind bridges that we closed down etc. So PCI bridges are kind of special. Right now we don't touch them, and if we ever do, that will be another issue. > In the end it isn't a very big issue. Using one vs. two passes in > dpm_suspend() is pretty unimportant. I also suspect that even if you do the USB host controller suspend synchronously, doing the actual USB devices asynchronously would still help - even if it's only "asynchronously per bus" thing. So in fact, it's probably a good first step to start off doing only the USB devices, not the controller. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 22:06 ` Linus Torvalds @ 2009-12-07 22:21 ` Alan Stern 2009-12-07 22:26 ` Linus Torvalds 0 siblings, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-07 22:21 UTC (permalink / raw) To: Linus Torvalds Cc: Zhang Rui, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list On Mon, 7 Dec 2009, Linus Torvalds wrote: > I also suspect that even if you do the USB host controller suspend > synchronously, doing the actual USB devices asynchronously would still > help - even if it's only "asynchronously per bus" thing. > > So in fact, it's probably a good first step to start off doing only the > USB devices, not the controller. Interesting you should say that. The patch I asked Arjan to test involved not suspending USB devices at all (root hubs being the exception). That is in fact just what we do when CONFIG_USB_SUSPEND isn't set. There's no need to suspend the individual devices when the whole system is going down. They will automatically suspend when the controller stops sending out SOF packets, which occurs when the root hub is suspended. The USB spec describes this, grandiosely, as a "global suspend". But yes, I agree. Doing just the USB devices is a good first step. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 22:21 ` Alan Stern @ 2009-12-07 22:26 ` Linus Torvalds 2009-12-07 23:16 ` Alan Stern 0 siblings, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-07 22:26 UTC (permalink / raw) To: Alan Stern Cc: Zhang Rui, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list On Mon, 7 Dec 2009, Alan Stern wrote: > > There's no need to suspend the individual devices when the whole system > is going down. They will automatically suspend when the controller > stops sending out SOF packets, which occurs when the root hub is > suspended. The USB spec describes this, grandiosely, as a "global > suspend". Ahh, but the sync vs async would then still matter on resume. No? Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 22:26 ` Linus Torvalds @ 2009-12-07 23:16 ` Alan Stern 0 siblings, 0 replies; 235+ messages in thread From: Alan Stern @ 2009-12-07 23:16 UTC (permalink / raw) To: Linus Torvalds Cc: Zhang Rui, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list On Mon, 7 Dec 2009, Linus Torvalds wrote: > > > On Mon, 7 Dec 2009, Alan Stern wrote: > > > > There's no need to suspend the individual devices when the whole system > > is going down. They will automatically suspend when the controller > > stops sending out SOF packets, which occurs when the root hub is > > suspended. The USB spec describes this, grandiosely, as a "global > > suspend". > > Ahh, but the sync vs async would then still matter on resume. No? That's complicated. If we assume the devices weren't runtime-suspended before the sleep began, then they would automatically resume themselves when the controller started transmitting EOF packets. So in that case resume would be fast and async wouldn't matter. But if the devices were runtime-suspended, then what? The safest course is to resume them during the system-wide resume. In that case yes, the sync vs async would matter. And if (as happens on many machines) the firmware messes up the controller settings during resume, then all the USB devices would have to be reset -- another slow procedure. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 21:41 ` Linus Torvalds 2009-12-07 21:47 ` Rafael J. Wysocki 2009-12-07 22:01 ` Alan Stern @ 2009-12-07 22:02 ` Rafael J. Wysocki 2009-12-07 22:16 ` Linus Torvalds 2 siblings, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-07 22:02 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Monday 07 December 2009, Linus Torvalds wrote: > > On Mon, 7 Dec 2009, Alan Stern wrote: > > > > It only seems that way because you didn't take into account devices > > that suspend synchronously but whose children suspend asynchronously. > > But why would I care? If somebody suspends synchronously, then that's what > he wants. > > > A synchronous suspend routine for a device with async child suspends > > would have to look just like your usb_node_suspend(): > > Sure. But that sounds like a "Doctor, it hurts when I do this" situation. > Don't do that. > > Make the USB host controller do its suspend asynchronously. We don't > suspend PCI bridges anyway, iirc (but I didn't actually check). And at > worst, we can make the PCI _bridges_ know about async suspends, and solve > it that way - without actually making any normal PCI drivers do it. BTW, I still don't quite understand why not to put the parent's down_write operation into the core. It's not going to hurt for the "synchronous" devices and the "asynchronous" ones will need to do it anyway. Also it looks like that's something to do unconditionally for all nodes having children, because the parent need not know if the children do async operations. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 22:02 ` Rafael J. Wysocki @ 2009-12-07 22:16 ` Linus Torvalds 2009-12-07 23:51 ` Rafael J. Wysocki 0 siblings, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-07 22:16 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Mon, 7 Dec 2009, Rafael J. Wysocki wrote: > > BTW, I still don't quite understand why not to put the parent's down_write > operation into the core. It's not going to hurt for the "synchronous" devices > and the "asynchronous" ones will need to do it anyway. That's what I started out doing (see the first pseudo-code with the two phases). But it _does_ actually hurt. Because it will hurt exactly for the "multiple hubs" case: if you have two USB hubs in parallel (and the case that Alan pointed out about a USB host bridge is the exact same deal), then you want to be able to suspend and resume those two independent hubs in parallel too. But if you do the "down_write()" synchronously in the core, that means that you are also stopping the whole "traverse the tree" thing - so now you aren't handling the hubs in parallel even if you are handling all the devices _behind_ them asynchronously. This "serialize while traversing the tree" was what I was initially trying to avoid with the two-phase approach, but that I realized (after writing the resume path) that I could avoid much better by just moving the parents down_write into the asynchronous path. > Also it looks like that's something to do unconditionally for all nodes > having children, because the parent need not know if the children do async > operations. True, and that was (again) the first iteration. But see above: in order to allow way more concurrency, you don't want to introduce the false dependency between the write-lock and the traversal of the tree (or, as Alan points out - just a list - but that doesn't really change anything) that is introduced by taking the lock synchronously. So by moving the write-lock to the asynchronous work that also shuts down the parent, you avoid that whole unnecessary serialization. But that means that you can't do the lock in generic code. Unless you want to do _all_ of the async logic in generic code and re-introduce the "dev->async_suspend" flag. I would be ok with that now that the infrastructure seems so simple. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 22:16 ` Linus Torvalds @ 2009-12-07 23:51 ` Rafael J. Wysocki 2009-12-08 3:27 ` Alan Stern 0 siblings, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-07 23:51 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Monday 07 December 2009, Linus Torvalds wrote: > > On Mon, 7 Dec 2009, Rafael J. Wysocki wrote: > > > > BTW, I still don't quite understand why not to put the parent's down_write > > operation into the core. It's not going to hurt for the "synchronous" devices > > and the "asynchronous" ones will need to do it anyway. > > That's what I started out doing (see the first pseudo-code with the two > phases). But it _does_ actually hurt. Hmm. If no one calls down_read() on the "synchronous" devices, their down_write()s will be nops. In turn, if somebody does call down_read(), it means they really need to wait for someone. They presumably don't need to wait for each other, but we don't really know that (otherwise they would have been "asynchronous"). > Because it will hurt exactly for the "multiple hubs" case: if you have two > USB hubs in parallel (and the case that Alan pointed out about a USB host > bridge is the exact same deal), then you want to be able to suspend and > resume those two independent hubs in parallel too. > > But if you do the "down_write()" synchronously in the core, that means > that you are also stopping the whole "traverse the tree" thing - so now > you aren't handling the hubs in parallel even if you are handling all the > devices _behind_ them asynchronously. > > This "serialize while traversing the tree" was what I was initially trying > to avoid with the two-phase approach, but that I realized (after writing > the resume path) that I could avoid much better by just moving the parents > down_write into the asynchronous path. But the asynchronous path has to be started somewhere. Basically, there are three possible places: the core itself, the bus type's suspend routine called by the core (same goes for resume of course), and the device driver's suspend routine called by the bus type. Now, I don't really see how we can put the the parent's down_write() in a child's suspend routine, for multiple reasons (one of them being that there can be multiple asynchronous children the parent needs to wait for), so it looks like it needs to be above the driver's suspend. However, the parent can be on a different bus type than the children, so it looks like we can only start the asynchronous path at the core level. > > Also it looks like that's something to do unconditionally for all nodes > > having children, because the parent need not know if the children do async > > operations. > > True, and that was (again) the first iteration. But see above: in order to > allow way more concurrency, you don't want to introduce the false > dependency between the write-lock and the traversal of the tree (or, as > Alan points out - just a list - but that doesn't really change anything) > that is introduced by taking the lock synchronously. > > So by moving the write-lock to the asynchronous work that also shuts down > the parent, you avoid that whole unnecessary serialization. But that means > that you can't do the lock in generic code. > > Unless you want to do _all_ of the async logic in generic code and > re-introduce the "dev->async_suspend" flag. Quite frankly, I would like to. > I would be ok with that now that the infrastructure seems so simple. Well, perhaps I should dig out my original async suspend/resume patches that didn't contain all of the non-essential stuff and post them here for discussion, after all ... Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 23:51 ` Rafael J. Wysocki @ 2009-12-08 3:27 ` Alan Stern 2009-12-08 12:23 ` Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) Rafael J. Wysocki 0 siblings, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-08 3:27 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > However, the parent can be on a different bus type than the children, so it > looks like we can only start the asynchronous path at the core level. Agreed. > > Unless you want to do _all_ of the async logic in generic code and > > re-introduce the "dev->async_suspend" flag. > > Quite frankly, I would like to. > > > I would be ok with that now that the infrastructure seems so simple. > > Well, perhaps I should dig out my original async suspend/resume patches > that didn't contain all of the non-essential stuff and post them here for > discussion, after all ... That seems like a very good idea. IIRC they were quite similar to what we have been discussing. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 3:27 ` Alan Stern @ 2009-12-08 12:23 ` Rafael J. Wysocki 2009-12-08 12:35 ` Rafael J. Wysocki 2009-12-08 15:35 ` Linus Torvalds 0 siblings, 2 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-08 12:23 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tuesday 08 December 2009, Alan Stern wrote: > On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > > > However, the parent can be on a different bus type than the children, so it > > looks like we can only start the asynchronous path at the core level. > > Agreed. > > > > Unless you want to do _all_ of the async logic in generic code and > > > re-introduce the "dev->async_suspend" flag. > > > > Quite frankly, I would like to. > > > > > I would be ok with that now that the infrastructure seems so simple. > > > > Well, perhaps I should dig out my original async suspend/resume patches > > that didn't contain all of the non-essential stuff and post them here for > > discussion, after all ... > > That seems like a very good idea. IIRC they were quite similar to what > we have been discussing. There you go. Below is the resume part. I have reworked the original patch a bit so that it's even simpler. I'll post the suspend part in a reply to this message. The idea is basically that if a device has the power.async_suspend flag set, we schedule the execution of it's resume callback asynchronously, but we wait for the device's parent to finish resume before the device's suspend is actually executed. The wait queue plus the op_complete flag combo plays the role of the locking in the Linus' picture, and it's essentially equivalent, since the devices being waited for during resume will have to wait during suspend, so for example if A has to wait for B during suspend, then B will have to wait for A during resume (thus they both need to know in advance who's going to wait for them and whom they need to wait for). Of course, the code in this patch has the problem that if there are two "asynchronous" devices in dpm_list separated by a series of "synchronous" devices, then they usually won't be resumed in parallel (which is what we ultimately want). That can be optimised in a couple of ways, but such optimisations add quite some details to the code, so let's just omit them for now. BTW, thanks to the discussion with Linus I've realized that the off-tree dependences may be (relatively easily) taken into account by making the interested drivers directly execute dpm_wait() for the extra devices they need to wait for, so the entire PM links thing is simply unnecessary. So it looks like the only thing this patch is missing are the optimisations mentioned above. [This version of the patch has only been slightly tested.] --- drivers/base/power/main.c | 129 +++++++++++++++++++++++++++++++++++++++---- include/linux/device.h | 6 ++ include/linux/pm.h | 4 + include/linux/resume-trace.h | 7 ++ 4 files changed, 134 insertions(+), 12 deletions(-) Index: linux-2.6/include/linux/pm.h =================================================================== --- linux-2.6.orig/include/linux/pm.h +++ linux-2.6/include/linux/pm.h @@ -412,15 +412,17 @@ struct dev_pm_info { pm_message_t power_state; unsigned int can_wakeup:1; unsigned int should_wakeup:1; + unsigned async_suspend:1; enum dpm_state status; /* Owned by the PM core */ + wait_queue_head_t wait_queue; #ifdef CONFIG_PM_SLEEP struct list_head entry; + unsigned int op_complete:1; #endif #ifdef CONFIG_PM_RUNTIME struct timer_list suspend_timer; unsigned long timer_expires; struct work_struct work; - wait_queue_head_t wait_queue; spinlock_t lock; atomic_t usage_count; atomic_t child_count; Index: linux-2.6/include/linux/device.h =================================================================== --- linux-2.6.orig/include/linux/device.h +++ linux-2.6/include/linux/device.h @@ -472,6 +472,12 @@ static inline int device_is_registered(s return dev->kobj.state_in_sysfs; } +static inline void device_enable_async_suspend(struct device *dev, bool enable) +{ + if (dev->power.status == DPM_ON) + dev->power.async_suspend = enable; +} + void driver_init(void); /* Index: linux-2.6/drivers/base/power/main.c =================================================================== --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -25,6 +25,7 @@ #include <linux/resume-trace.h> #include <linux/rwsem.h> #include <linux/interrupt.h> +#include <linux/async.h> #include "../base.h" #include "power.h" @@ -42,6 +43,7 @@ LIST_HEAD(dpm_list); static DEFINE_MUTEX(dpm_list_mtx); +static pm_message_t pm_transition; /* * Set once the preparation of devices for a PM transition has started, reset @@ -56,6 +58,7 @@ static bool transition_started; void device_pm_init(struct device *dev) { dev->power.status = DPM_ON; + init_waitqueue_head(&dev->power.wait_queue); pm_runtime_init(dev); } @@ -162,6 +165,56 @@ void device_pm_move_last(struct device * } /** + * dpm_reset - Clear op_complete for given device. + * @dev: Device to handle. + */ +static void dpm_reset(struct device *dev) +{ + dev->power.op_complete = false; +} + +/** + * dpm_finish - Set op_complete for a device and wake up threads waiting for it. + */ +static void dpm_finish(struct device *dev) +{ + dev->power.op_complete = true; + wake_up_all(&dev->power.wait_queue); +} + +/** + * dpm_wait - Wait for a PM operation to complete. + * @dev: Device to wait for. + * @async: If true, ignore the device's async_suspend flag. + * + * Wait for a PM operation carried out for @dev to complete, unless @dev has to + * be handled synchronously and @async is false. + */ +static void dpm_wait(struct device *dev, bool async) +{ + if (!dev) + return; + + if (!(async || dev->power.async_suspend)) + return; + + if (!dev->power.op_complete) + wait_event(dev->power.wait_queue, !!dev->power.op_complete); +} + +/** + * dpm_synchronize - Wait for PM callbacks of all devices to complete. + */ +static void dpm_synchronize(void) +{ + struct device *dev; + + async_synchronize_full(); + list_for_each_entry(dev, &dpm_list, power.entry) + dpm_reset(dev); +} + +/** * pm_op - Execute the PM operation appropriate for given PM event. * @dev: Device to handle. * @ops: PM operations to choose from. @@ -334,25 +387,48 @@ static void pm_dev_err(struct device *de * The driver of @dev will not receive interrupts while this function is being * executed. */ -static int device_resume_noirq(struct device *dev, pm_message_t state) +static int __device_resume_noirq(struct device *dev, pm_message_t state) { int error = 0; TRACE_DEVICE(dev); TRACE_RESUME(0); - if (!dev->bus) - goto End; - - if (dev->bus->pm) { + if (dev->bus && dev->bus->pm) { pm_dev_dbg(dev, state, "EARLY "); error = pm_noirq_op(dev, dev->bus->pm, state); } - End: + + dpm_finish(dev); + TRACE_RESUME(error); return error; } +static void async_resume_noirq(void *data, async_cookie_t cookie) +{ + struct device *dev = (struct device *)data; + int error; + + dpm_wait(dev->parent, true); + error = __device_resume_noirq(dev, pm_transition); + if (error) + pm_dev_err(dev, pm_transition, " async EARLY", error); + put_device(dev); +} + +static int device_resume_noirq(struct device *dev) +{ + if (dev->power.async_suspend && !pm_trace_is_enabled()) { + get_device(dev); + async_schedule(async_resume_noirq, dev); + return 0; + } + + dpm_wait(dev->parent, false); + return __device_resume_noirq(dev, pm_transition); +} + /** * dpm_resume_noirq - Execute "early resume" callbacks for non-sysdev devices. * @state: PM transition of the system being carried out. @@ -366,26 +442,28 @@ void dpm_resume_noirq(pm_message_t state mutex_lock(&dpm_list_mtx); transition_started = false; + pm_transition = state; list_for_each_entry(dev, &dpm_list, power.entry) if (dev->power.status > DPM_OFF) { int error; dev->power.status = DPM_OFF; - error = device_resume_noirq(dev, state); + error = device_resume_noirq(dev); if (error) pm_dev_err(dev, state, " early", error); } + dpm_synchronize(); mutex_unlock(&dpm_list_mtx); resume_device_irqs(); } EXPORT_SYMBOL_GPL(dpm_resume_noirq); /** - * device_resume - Execute "resume" callbacks for given device. + * __device_resume - Execute "resume" callbacks for given device. * @dev: Device to handle. * @state: PM transition of the system being carried out. */ -static int device_resume(struct device *dev, pm_message_t state) +static int __device_resume(struct device *dev, pm_message_t state) { int error = 0; @@ -426,11 +504,36 @@ static int device_resume(struct device * } End: up(&dev->sem); + dpm_finish(dev); TRACE_RESUME(error); return error; } +static void async_resume(void *data, async_cookie_t cookie) +{ + struct device *dev = (struct device *)data; + int error; + + dpm_wait(dev->parent, true); + error = __device_resume(dev, pm_transition); + if (error) + pm_dev_err(dev, pm_transition, " async", error); + put_device(dev); +} + +static int device_resume(struct device *dev) +{ + if (dev->power.async_suspend && !pm_trace_is_enabled()) { + get_device(dev); + async_schedule(async_resume, dev); + return 0; + } + + dpm_wait(dev->parent, false); + return __device_resume(dev, pm_transition); +} + /** * dpm_resume - Execute "resume" callbacks for non-sysdev devices. * @state: PM transition of the system being carried out. @@ -444,6 +547,7 @@ static void dpm_resume(pm_message_t stat INIT_LIST_HEAD(&list); mutex_lock(&dpm_list_mtx); + pm_transition = state; while (!list_empty(&dpm_list)) { struct device *dev = to_device(dpm_list.next); @@ -454,7 +558,7 @@ static void dpm_resume(pm_message_t stat dev->power.status = DPM_RESUMING; mutex_unlock(&dpm_list_mtx); - error = device_resume(dev, state); + error = device_resume(dev); mutex_lock(&dpm_list_mtx); if (error) @@ -468,6 +572,7 @@ static void dpm_resume(pm_message_t stat put_device(dev); } list_splice(&list, &dpm_list); + dpm_synchronize(); mutex_unlock(&dpm_list_mtx); } @@ -793,8 +898,10 @@ static int dpm_prepare(pm_message_t stat break; } dev->power.status = DPM_SUSPENDING; - if (!list_empty(&dev->power.entry)) + if (!list_empty(&dev->power.entry)) { list_move_tail(&dev->power.entry, &list); + dpm_reset(dev); + } put_device(dev); } list_splice(&list, &dpm_list); Index: linux-2.6/include/linux/resume-trace.h =================================================================== --- linux-2.6.orig/include/linux/resume-trace.h +++ linux-2.6/include/linux/resume-trace.h @@ -6,6 +6,11 @@ extern int pm_trace_enabled; +static inline int pm_trace_is_enabled(void) +{ + return pm_trace_enabled; +} + struct device; extern void set_trace_device(struct device *); extern void generate_resume_trace(const void *tracedata, unsigned int user); @@ -17,6 +22,8 @@ extern void generate_resume_trace(const #else +static inline int pm_trace_is_enabled(void) { return 0; } + #define TRACE_DEVICE(dev) do { } while (0) #define TRACE_RESUME(dev) do { } while (0) ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 12:23 ` Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) Rafael J. Wysocki @ 2009-12-08 12:35 ` Rafael J. Wysocki 2009-12-08 15:35 ` Linus Torvalds 1 sibling, 0 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-08 12:35 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tuesday 08 December 2009, Rafael J. Wysocki wrote: > On Tuesday 08 December 2009, Alan Stern wrote: > > On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > > > > > However, the parent can be on a different bus type than the children, so it > > > looks like we can only start the asynchronous path at the core level. > > > > Agreed. > > > > > > Unless you want to do _all_ of the async logic in generic code and > > > > re-introduce the "dev->async_suspend" flag. > > > > > > Quite frankly, I would like to. > > > > > > > I would be ok with that now that the infrastructure seems so simple. > > > > > > Well, perhaps I should dig out my original async suspend/resume patches > > > that didn't contain all of the non-essential stuff and post them here for > > > discussion, after all ... > > > > That seems like a very good idea. IIRC they were quite similar to what > > we have been discussing. > > There you go. Below is the suspend part. It contains some extra code for rolling back the suspend if one of the asynchronous callbacks returns error code, but apart from this it's completely analogous to the resume part. [This patch has only been slightly tested.] --- drivers/base/power/main.c | 113 ++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 104 insertions(+), 9 deletions(-) Index: linux-2.6/drivers/base/power/main.c =================================================================== --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -202,6 +202,17 @@ static void dpm_wait(struct device *dev, wait_event(dev->power.wait_queue, !!dev->power.op_complete); } +static int device_pm_wait_fn(struct device *dev, void *async_ptr) +{ + dpm_wait(dev, *((bool *)async_ptr)); + return 0; +} + +static void dpm_wait_for_children(struct device *dev, bool async) +{ + device_for_each_child(dev, &async, device_pm_wait_fn); +} + /** * dpm_synchronize - Wait for PM callbacks of all devices to complete. */ @@ -638,6 +649,8 @@ static void dpm_complete(pm_message_t st mutex_unlock(&dpm_list_mtx); } +static int async_error; + /** * dpm_resume_end - Execute "resume" callbacks and complete system transition. * @state: PM transition of the system being carried out. @@ -685,20 +698,52 @@ static pm_message_t resume_event(pm_mess * The driver of @dev will not receive interrupts while this function is being * executed. */ -static int device_suspend_noirq(struct device *dev, pm_message_t state) +static int __device_suspend_noirq(struct device *dev, pm_message_t state) { int error = 0; - if (!dev->bus) - return 0; - - if (dev->bus->pm) { + if (dev->bus && dev->bus->pm) { pm_dev_dbg(dev, state, "LATE "); error = pm_noirq_op(dev, dev->bus->pm, state); } + + dpm_finish(dev); + return error; } +static void async_suspend_noirq(void *data, async_cookie_t cookie) +{ + struct device *dev = (struct device *)data; + int error = async_error; + + if (error) + return; + + dpm_wait_for_children(dev, true); + error = __device_suspend_noirq(dev, pm_transition); + if (error) { + pm_dev_err(dev, pm_transition, " async LATE", error); + dev->power.status = DPM_OFF; + } + put_device(dev); + + if (error && !async_error) + async_error = error; +} + +static int device_suspend_noirq(struct device *dev) +{ + if (dev->power.async_suspend) { + get_device(dev); + async_schedule(async_suspend_noirq, dev); + return 0; + } + + dpm_wait_for_children(dev, false); + return __device_suspend_noirq(dev, pm_transition); +} + /** * dpm_suspend_noirq - Execute "late suspend" callbacks for non-sysdev devices. * @state: PM transition of the system being carried out. @@ -713,14 +758,21 @@ int dpm_suspend_noirq(pm_message_t state suspend_device_irqs(); mutex_lock(&dpm_list_mtx); + pm_transition = state; list_for_each_entry_reverse(dev, &dpm_list, power.entry) { - error = device_suspend_noirq(dev, state); + dev->power.status = DPM_OFF_IRQ; + error = device_suspend_noirq(dev); if (error) { pm_dev_err(dev, state, " late", error); + dev->power.status = DPM_OFF; + break; + } + if (async_error) { + error = async_error; break; } - dev->power.status = DPM_OFF_IRQ; } + dpm_synchronize(); mutex_unlock(&dpm_list_mtx); if (error) dpm_resume_noirq(resume_event(state)); @@ -733,7 +785,7 @@ EXPORT_SYMBOL_GPL(dpm_suspend_noirq); * @dev: Device to handle. * @state: PM transition of the system being carried out. */ -static int device_suspend(struct device *dev, pm_message_t state) +static int __device_suspend(struct device *dev, pm_message_t state) { int error = 0; @@ -773,10 +825,45 @@ static int device_suspend(struct device } End: up(&dev->sem); + dpm_finish(dev); return error; } +static void async_suspend(void *data, async_cookie_t cookie) +{ + struct device *dev = (struct device *)data; + int error = async_error; + + if (error) + goto End; + + dpm_wait_for_children(dev, true); + error = __device_suspend(dev, pm_transition); + if (error) { + pm_dev_err(dev, pm_transition, " async", error); + + dev->power.status = DPM_SUSPENDING; + if (!async_error) + async_error = error; + } + + End: + put_device(dev); +} + +static int device_suspend(struct device *dev, pm_message_t state) +{ + if (dev->power.async_suspend) { + get_device(dev); + async_schedule(async_suspend, dev); + return 0; + } + + dpm_wait_for_children(dev, false); + return __device_suspend(dev, pm_transition); +} + /** * dpm_suspend - Execute "suspend" callbacks for all non-sysdev devices. * @state: PM transition of the system being carried out. @@ -788,10 +875,12 @@ static int dpm_suspend(pm_message_t stat INIT_LIST_HEAD(&list); mutex_lock(&dpm_list_mtx); + pm_transition = state; while (!list_empty(&dpm_list)) { struct device *dev = to_device(dpm_list.prev); get_device(dev); + dev->power.status = DPM_OFF; mutex_unlock(&dpm_list_mtx); error = device_suspend(dev, state); @@ -799,16 +888,21 @@ static int dpm_suspend(pm_message_t stat mutex_lock(&dpm_list_mtx); if (error) { pm_dev_err(dev, state, "", error); + dev->power.status = DPM_SUSPENDING; put_device(dev); break; } - dev->power.status = DPM_OFF; if (!list_empty(&dev->power.entry)) list_move(&dev->power.entry, &list); put_device(dev); + if (async_error) + break; } list_splice(&list, dpm_list.prev); + dpm_synchronize(); mutex_unlock(&dpm_list_mtx); + if (!error) + error = async_error; return error; } @@ -867,6 +961,7 @@ static int dpm_prepare(pm_message_t stat INIT_LIST_HEAD(&list); mutex_lock(&dpm_list_mtx); transition_started = true; + async_error = 0; while (!list_empty(&dpm_list)) { struct device *dev = to_device(dpm_list.next); ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 12:23 ` Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) Rafael J. Wysocki 2009-12-08 12:35 ` Rafael J. Wysocki @ 2009-12-08 15:35 ` Linus Torvalds 2009-12-08 15:55 ` Alan Stern 2009-12-08 19:44 ` Rafael J. Wysocki 1 sibling, 2 replies; 235+ messages in thread From: Linus Torvalds @ 2009-12-08 15:35 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > > The wait queue plus the op_complete flag combo plays the role of the locking > in the Linus' picture Please just use the lock. Don't make up your own locking crap. Really. Your patch is horrible. Exactly because your locking is horribly mis-designed. You can't say things are complete from an interrupt, for example, since you made it some random bitfield, which has unknown characteristics (ie non-atomic read-modify-write etc). The fact is, any time anybody makes up a new locking mechanism, THEY ALWAYS GET IT WRONG. Don't do it. I suggested using the rwsem locking for a good reason. It made sense. It was simpler. Just do it that way, stop making up crap. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 15:35 ` Linus Torvalds @ 2009-12-08 15:55 ` Alan Stern 2009-12-08 16:42 ` Linus Torvalds 2009-12-08 19:44 ` Rafael J. Wysocki 1 sibling, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-08 15:55 UTC (permalink / raw) To: Linus Torvalds Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Linus Torvalds wrote: > Please just use the lock. Don't make up your own locking crap. Really. > > Your patch is horrible. Exactly because your locking is horribly > mis-designed. You can't say things are complete from an interrupt, for > example, since you made it some random bitfield, which has unknown > characteristics (ie non-atomic read-modify-write etc). > > The fact is, any time anybody makes up a new locking mechanism, THEY > ALWAYS GET IT WRONG. Don't do it. > > I suggested using the rwsem locking for a good reason. It made sense. It > was simpler. Just do it that way, stop making up crap. The semantics needed for this kind of lock aren't really the same as for an rwsem (although obviously an rwsem will do the job). Basically it needs to have the capability for multiple users to lock it (no blocking when acquiring a lock) and the capability for a user to wait until it is totally unlocked. It could be implemented trivially using an atomic_t counter and a waitqueue head. Is this a standard sort of lock? It's a lot simpler than most others. I don't recall seeing anything quite like it anywhere; the closest thing might be some kind of barrier. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 15:55 ` Alan Stern @ 2009-12-08 16:42 ` Linus Torvalds 2009-12-08 18:08 ` Alan Stern 0 siblings, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-08 16:42 UTC (permalink / raw) To: Alan Stern Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Alan Stern wrote: > > The semantics needed for this kind of lock aren't really the same as > for an rwsem (although obviously an rwsem will do the job). Basically > it needs to have the capability for multiple users to lock it (no > blocking when acquiring a lock) and the capability for a user to wait > until it is totally unlocked. It could be implemented trivially using > an atomic_t counter and a waitqueue head. > > Is this a standard sort of lock? Yes it is. It's called a rwlock. The counter is for readers, the exclusion is for writers. Really. And the thing is, you actually do want the rwlock semantics, because on the resume side you want the parent to lock it for writing first (so that the children can wait for the parent to have completed its resume. So we actually _want_ the full rwlock semantics. See the code I posted earlier. Here condensed into one email: - resume: usb_node_resume(node) { // Wait for parent to finish resume down_read(node->parent->lock); // .. before resuming outselves node->resume(node) // Now we're all done up_read(node->parent->lock); up_write(node->lock); } /* caller: */ .. // This won't block, because we resume parents before children, // and the children will take the read lock. down_write(leaf->lock); // Do the blocking part asynchronously async_schedule(usb_node_resume, leaf); .. - suspend: usb_node_suspend(node) { // Start our suspend. This will block if we have any // children that are still busy suspending (they will // have done a down_read() in their suspend). down_write(node->lock); node->suspend(node); up_write(node->lock); // This lets our parent continue up_read(node->parent->lock); } /* caller: */ // This won't block, because we suspend nodes before parents down_read(node->parent->lock); // Do the part that may block asynchronously async_schedule(do_usb_node_suspend, node); It really should be that simple. Nothing more, nothing less. And with the above, finishing the suspend (or resume) from interrupts is fine, and you don't have any new lock that has undefined memory ordering issues etc. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 16:42 ` Linus Torvalds @ 2009-12-08 18:08 ` Alan Stern 2009-12-08 18:41 ` Linus Torvalds 0 siblings, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-08 18:08 UTC (permalink / raw) To: Linus Torvalds Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Linus Torvalds wrote: > On Tue, 8 Dec 2009, Alan Stern wrote: > > > > The semantics needed for this kind of lock aren't really the same as > > for an rwsem (although obviously an rwsem will do the job). Basically > > it needs to have the capability for multiple users to lock it (no > > blocking when acquiring a lock) and the capability for a user to wait > > until it is totally unlocked. It could be implemented trivially using > > an atomic_t counter and a waitqueue head. > > > > Is this a standard sort of lock? > > Yes it is. > > It's called a rwlock. The counter is for readers, the exclusion is for > writers. > > Really. > > And the thing is, you actually do want the rwlock semantics, because on > the resume side you want the parent to lock it for writing first (so that > the children can wait for the parent to have completed its resume. > > So we actually _want_ the full rwlock semantics. I'm not convinced. Condense the description a little farther: Suspend: Children lock the parent first. When they are finished they unlock the parent, allowing it to proceed. Resume: Parent locks itself first. When it is finished it unlocks itself, allowing the children to proceed. The whole readers vs. writers thing is a non-sequitur. (For instance, this never uses the fact that writers exclude each other.) In each case a lock is taken and eventually released, allowing someone else to stop waiting and move forward. In the suspend case we have multiple lockers and one waiter, whereas in the resume case we have one locker and multiple waiters. The simplest generalization is to allow both multiple lockers and multiple waiters. Call it a waitlock, for want of a better name: wait_lock(wl) { atomic_inc(&wl->count); } wait_unlock(wl) { if (atomic_dec_and_test(&wl->count)) { smp_mb__after_atomic_dec(); wake_up_all(wl->wqh); } } wait_for_lock(wl) { wait_event(wl->wqh, atomic_read(&wl->count) == 0); smp_rmb(); } Note that both wait_lock() and wait_unlock() can be called in_interrupt. > See the code I posted earlier. Here condensed into one email: > > - resume: > > usb_node_resume(node) > { > // Wait for parent to finish resume > down_read(node->parent->lock); > // .. before resuming outselves > node->resume(node) > > // Now we're all done > up_read(node->parent->lock); > up_write(node->lock); > } > > /* caller: */ > .. > // This won't block, because we resume parents before children, > // and the children will take the read lock. > down_write(leaf->lock); > // Do the blocking part asynchronously > async_schedule(usb_node_resume, leaf); > .. This becomes: usb_node_resume(node) { // Wait for parent to finish resume wait_for_lock(node->parent->lock); // .. before resuming outselves node->resume(node) // Now we're all done wait_unlock(node->lock); } /* caller: */ .. // This can't block, because wait_lock() is non-blocking. wait_lock(node->lock); // Do the blocking part asynchronously async_schedule(usb_node_resume, leaf); .. > - suspend: > > usb_node_suspend(node) > { > // Start our suspend. This will block if we have any > // children that are still busy suspending (they will > // have done a down_read() in their suspend). > down_write(node->lock); > node->suspend(node); > up_write(node->lock); > > // This lets our parent continue > up_read(node->parent->lock); > } > > /* caller: */ > > // This won't block, because we suspend nodes before parents > down_read(node->parent->lock); > // Do the part that may block asynchronously > async_schedule(do_usb_node_suspend, node); usb_node_suspend(node) { // Start our suspend. This will block if we have any // children that are still busy suspending (they will // have done a wait_lock() in their suspend). wait_for_lock(node->lock); node->suspend(node); // This lets our parent continue wait_unlock(node->parent->lock); } /* caller: */ .. // This can't block, because wait_lock is non-blocking. wait_lock(node->parent->lock); // Do the part that may block asynchronously async_schedule(do_usb_node_suspend, node); .. > It really should be that simple. Nothing more, nothing less. And with the > above, finishing the suspend (or resume) from interrupts is fine, and you > don't have any new lock that has undefined memory ordering issues etc. Aren't waitlocks simpler than rwsems? Not as generally useful, perhaps. But just as correct in this situation. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 18:08 ` Alan Stern @ 2009-12-08 18:41 ` Linus Torvalds 2009-12-08 18:52 ` Linus Torvalds 2009-12-08 19:30 ` Alan Stern 0 siblings, 2 replies; 235+ messages in thread From: Linus Torvalds @ 2009-12-08 18:41 UTC (permalink / raw) To: Alan Stern Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Alan Stern wrote: > > > > So we actually _want_ the full rwlock semantics. > > I'm not convinced. Condense the description a little farther: > > Suspend: Children lock the parent first. When they are > finished they unlock the parent, allowing it to > proceed. > > Resume: Parent locks itself first. When it is finished > it unlocks itself, allowing the children to proceed. Yes. You can implement it with a simple lock with a count. Nobody debates that. But a simple counting lock _is_ a rwlock. Really. They are 100% semantically equivalent. There is no difference. > The whole readers vs. writers thing is a non-sequitur. No it's not. It's a 100% equivalent problem. It's purely a change of wording. The end result is the same. > The simplest generalization is to allow both multiple lockers and > multiple waiters. Call it a waitlock, for want of a better name: But we have that. It _has_ a better name: rwlocks. And the reason the name is better is because now the name describes all the semantics to anybody who has ever taken a course in operating systems or in parallelism. It's also a better implementation, because it actually _works_. > wait_lock(wl) > { > atomic_inc(&wl->count); > } > > wait_unlock(wl) > { > if (atomic_dec_and_test(&wl->count)) { > smp_mb__after_atomic_dec(); > wake_up_all(wl->wqh); > } > } > > wait_for_lock(wl) > { > wait_event(wl->wqh, atomic_read(&wl->count) == 0); > smp_rmb(); > } > > Note that both wait_lock() and wait_unlock() can be called > in_interrupt. And note how even though you sprinkled random memory barriers around, you still got it wrong. So you just implemented a buggy lock, and for what gain? Tell me exactly why your buggy lock (assuming you'd know enough about memory ordering to actually fix it) is better than just using the existing one? It's certainly not smaller. It's not faster. It doesn't have support for lockdep. And it's BUGGY. Really. Tell me why you want to re-implement an existing lock - badly. [ Hint: you need a smp_mb() *before* the atomic_dec() in wait-unlock, so that anybody else who sees the new value will be guaranteed to have seen anything else the unlocker did. You also need a smp_mb() in the wait_for_lock(), not a smp_rmb(). Can't allow writes to migrate up either. 'atomic_read()' does not imply any barriers. But most architectures can optimize these things for their particular memory ordering model, and do so in their rwsem implementation. ] > This becomes: > > usb_node_resume(node) > { > // Wait for parent to finish resume > wait_for_lock(node->parent->lock); > // .. before resuming outselves > node->resume(node) > > // Now we're all done > wait_unlock(node->lock); > } > > /* caller: */ > .. > // This can't block, because wait_lock() is non-blocking. > wait_lock(node->lock); > // Do the blocking part asynchronously > async_schedule(usb_node_resume, leaf); > .. Umm? Same thing, different words? That "wait_for_lock()" is equivalent to a 'read_lock()+read_unlock()'. We _could_ expose such a mechanism for rwsem's too, but why do it? It's actually nicer to use a real read-lock - and do it _around_ the operation, because now the locking also automatically gets things like overlapping suspends and resumes right. (Which you'd obviously hope never happens, but it's nice from a conceptual standpoint to know that the locking is robust). > Aren't waitlocks simpler than rwsems? Not as generally useful, > perhaps. But just as correct in this situation. NO! Dammit. I started this whole rant with this comment to Rafael: "The fact is, any time anybody makes up a new locking mechanism, THEY ALWAYS GET IT WRONG. Don't do it." Take heed. You got it wrong. Admit it. Locking is _hard_. SMP memory ordering is HARD. So leave locking to the pro's. They _also_ got it wrong, but they got it wrong several years ago, and fixed up their sh*t. This is why you use generic locking. ALWAYS. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 18:41 ` Linus Torvalds @ 2009-12-08 18:52 ` Linus Torvalds 2009-12-08 19:34 ` Alan Stern 2009-12-08 19:30 ` Alan Stern 1 sibling, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-08 18:52 UTC (permalink / raw) To: Alan Stern Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Linus Torvalds wrote: > > [ Hint: you need a smp_mb() *before* the atomic_dec() in wait-unlock, so > that anybody else who sees the new value will be guaranteed to have seen > anything else the unlocker did. > > You also need a smp_mb() in the wait_for_lock(), not a smp_rmb(). Can't > allow writes to migrate up either. 'atomic_read()' does not imply any > barriers. > > But most architectures can optimize these things for their particular > memory ordering model, and do so in their rwsem implementation. ] Side note: if this was a real lock, you'd also needed an smp_wmb() in the 'wait_lock()' path after the atomic_inc(), to make sure that others see the atomic lock was seen by other people before the suspend started. In your usage scenario, I don't think it would ever be noticeable, since the other users are always going to start running from the same thread that did the wait_lock(), so even if they run on other CPU's, we'll have scheduled _to_ those other CPU's and done enough memory ordering to guarantee that they will see the thing. So it would be ok in this situation, simply because it acts as an initializer and never sees any real SMP issues. But it's an example of how you now don't just depend on the locking primitives themselves doing the right thing, you end up depending very subtly on exactly how the lock is used. The standard locks do have the same kind of issue for initializers, but we avoid it elsewhere because it's so risky. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 18:52 ` Linus Torvalds @ 2009-12-08 19:34 ` Alan Stern 0 siblings, 0 replies; 235+ messages in thread From: Alan Stern @ 2009-12-08 19:34 UTC (permalink / raw) To: Linus Torvalds Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Linus Torvalds wrote: > Side note: if this was a real lock, you'd also needed an smp_wmb() in the > 'wait_lock()' path after the atomic_inc(), to make sure that others see > the atomic lock was seen by other people before the suspend started. > > In your usage scenario, I don't think it would ever be noticeable, since > the other users are always going to start running from the same thread > that did the wait_lock(), so even if they run on other CPU's, we'll have > scheduled _to_ those other CPU's and done enough memory ordering to > guarantee that they will see the thing. > > So it would be ok in this situation, simply because it acts as an > initializer and never sees any real SMP issues. Yes. I would have brought this up, but you made the point for me. > But it's an example of how you now don't just depend on the locking > primitives themselves doing the right thing, you end up depending very > subtly on exactly how the lock is used. The standard locks do have the > same kind of issue for initializers, but we avoid it elsewhere because > it's so risky. No doubt there are other reasons why the "wait-lock" pattern doesn't get used enough to be noticed. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 18:41 ` Linus Torvalds 2009-12-08 18:52 ` Linus Torvalds @ 2009-12-08 19:30 ` Alan Stern 2009-12-08 20:48 ` Linus Torvalds 1 sibling, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-08 19:30 UTC (permalink / raw) To: Linus Torvalds Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Linus Torvalds wrote: > > The whole readers vs. writers thing is a non-sequitur. > > No it's not. > > It's a 100% equivalent problem. It's purely a change of wording. The end > result is the same. Well, of course the end result is the same (ignoring bugs) -- that was the point. It doesn't follow that the two locking mechanisms are 100% equivalent. > And note how even though you sprinkled random memory barriers around, you > still got it wrong. Yes. That comes of trying to think at the keyboard. > It's certainly not smaller. It's not faster. It doesn't have support for > lockdep. And it's BUGGY. Lockdep will choke on the rwsem approach anyway. It has never been very good at handling tree-structured locking, especially when there are non-parent-child interactions. But never mind. > Really. Tell me why you want to re-implement an existing lock - badly. I didn't want to. The whole exercise was intended to make a point -- that rwsems do more than we really need here. > [ Hint: you need a smp_mb() *before* the atomic_dec() in wait-unlock, so > that anybody else who sees the new value will be guaranteed to have seen > anything else the unlocker did. Yes. > You also need a smp_mb() in the wait_for_lock(), not a smp_rmb(). Can't > allow writes to migrate up either. 'atomic_read()' does not imply any > barriers. No, that's not needed. Unlike reads, writes can't move in front of data or control dependencies. Or so I've been lead to believe... > That "wait_for_lock()" is equivalent to a 'read_lock()+read_unlock()'. Not really. It also corresponds to a 'write_lock()+write_unlock()' (in the suspend routine). Are you claiming these two compound operations are equivalent? > We > _could_ expose such a mechanism for rwsem's too, but why do it? It's > actually nicer to use a real read-lock - and do it _around_ the operation, > because now the locking also automatically gets things like overlapping > suspends and resumes right. > > (Which you'd obviously hope never happens, but it's nice from a conceptual > standpoint to know that the locking is robust). > Take heed. You got it wrong. Admit it. Locking is _hard_. SMP memory > ordering is HARD. Oh, there's no question about that. I never seriously intended this stuff to be adopted. It was just for discussion. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 19:30 ` Alan Stern @ 2009-12-08 20:48 ` Linus Torvalds 2009-12-08 21:32 ` Alan Stern 0 siblings, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-08 20:48 UTC (permalink / raw) To: Alan Stern Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Alan Stern wrote: > > > You also need a smp_mb() in the wait_for_lock(), not a smp_rmb(). Can't > > allow writes to migrate up either. 'atomic_read()' does not imply any > > barriers. > > No, that's not needed. Unlike reads, writes can't move in front of > data or control dependencies. Or so I've been lead to believe... Sure they can. Control dependencies are trivial - it's called "branch prediction", and everybody does it, and data dependencies don't exist on many CPU architectures (even to the point of reading through a pointer that you loaded). But yes, on x86, stores only move down. But that's an x86-specific thing. [ Not that it's also not very common - write buffering is easy and matters for performance, so any in-order implementation will generally do it. In contrast, writes moving up doesn't really help peformance and is harder to do, but can happen with a weakly ordered memory subsystem especially if you have multi-way caches where some ways are busy and end up being congested. So the _common_ case is definitely about delaying writes and doing reads early if possible. But it's not necessarily at all guaranteed in general. ] > > That "wait_for_lock()" is equivalent to a 'read_lock()+read_unlock()'. > > Not really. It also corresponds to a 'write_lock()+write_unlock()' (in > the suspend routine). Are you claiming these two compound operations > are equivalent? They have separate semantics, and you just want to pick the one that suits you. Your counting lock doesn't have the "read_lock+read_unlock" version, it only has the write_lock/unlock one (ie it requires totally unlocked thing). The point being, rwsem's can do everything your counting lock does. And they already exist. And they already know about all the subtleties of architecture-specific memory ordering etc. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 20:48 ` Linus Torvalds @ 2009-12-08 21:32 ` Alan Stern 2009-12-08 21:52 ` Christian Borntraeger 2009-12-08 22:16 ` Linus Torvalds 0 siblings, 2 replies; 235+ messages in thread From: Alan Stern @ 2009-12-08 21:32 UTC (permalink / raw) To: Linus Torvalds Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Linus Torvalds wrote: > > No, that's not needed. Unlike reads, writes can't move in front of > > data or control dependencies. Or so I've been lead to believe... > > Sure they can. Control dependencies are trivial - it's called "branch > prediction", and everybody does it, and data dependencies don't exist on > many CPU architectures (even to the point of reading through a pointer > that you loaded). Wait a second. Are you saying that with code like this: if (x == 1) y = 5; the CPU may write to y before it has finished reading the value of x? And this write is visible to other CPUs, so that if x was initially 0 and a second CPU sets x to 1, the second CPU may see y == 5 before it executes the write to x (whatever that may mean)? Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 21:32 ` Alan Stern @ 2009-12-08 21:52 ` Christian Borntraeger 2009-12-08 22:16 ` Linus Torvalds 1 sibling, 0 replies; 235+ messages in thread From: Christian Borntraeger @ 2009-12-08 21:52 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list > > Sure they can. Control dependencies are trivial - it's called "branch > > prediction", and everybody does it, and data dependencies don't exist on > > many CPU architectures (even to the point of reading through a pointer > > that you loaded). > > Wait a second. Are you saying that with code like this: > > if (x == 1) > y = 5; > > the CPU may write to y before it has finished reading the value of x? > And this write is visible to other CPUs, so that if x was initially 0 > and a second CPU sets x to 1, the second CPU may see y == 5 before it > executes the write to x (whatever that may mean)? No, the write really depends on x being 1 at any time before the comparison. On the other hand x being != 0 during the comparison does not prevent the write without proper locking or barriers. Have a look at http://www.linuxjournal.com/article/8211 http://www.linuxjournal.com/article/8212 especially at the alpha part what can happen when dealing with pointer accesses. Christian ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 21:32 ` Alan Stern 2009-12-08 21:52 ` Christian Borntraeger @ 2009-12-08 22:16 ` Linus Torvalds 2009-12-09 19:06 ` Alan Stern 1 sibling, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-08 22:16 UTC (permalink / raw) To: Alan Stern Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Alan Stern wrote: > > > > Sure they can. Control dependencies are trivial - it's called "branch > > prediction", and everybody does it, and data dependencies don't exist on > > many CPU architectures (even to the point of reading through a pointer > > that you loaded). > > Wait a second. Are you saying that with code like this: > > if (x == 1) > y = 5; > > the CPU may write to y before it has finished reading the value of x? Well, in a way. The branch may have been predicted, and the CPU can _internally_ have done the 'y=5' thing into a write buffer before it even did the read. Some time later it will have to _verify_ the prediction and then perhaps kill the write before it makes it to a data structure that is visible to others, but internally from the CPU standpoint, yes, the write could have happened before the read. Now, whether that write is "before" or "after" the read is debatable. But one way of looking at it is certainly that the write took place earlier, and the read might have just caused it to be undone. And there are real effects of this - looking at the bus, you might have a bus transaction to get the cacheline that contains 'y' for exclusive access happen _before_ the bus transaction that reads in the value of 'x' (but you'd never see the writeout of that '5' before). > And this write is visible to other CPUs, so that if x was initially 0 > and a second CPU sets x to 1, the second CPU may see y == 5 before it > executes the write to x (whatever that may mean)? Well, yes and no. CPU1 above won't release the '5' until it has confirmed the '1' (even if it does so by reading it late). but assuming the other CPU also does speculation, then yes, the situation you describe could happen. If the other CPU does z = y; x = 1; then it's certainly possible that 'z' contains 5 at the end (even if both x and y started out zero). Because now the read of 'y' on that other CPU might be delayed, and the write of 'x' goes ahead, CPU1 sees the 1, and commits its write of 5, sp when CPU2 gets the cacheline, z will now contain 5. Is it likely? No. CPU microarchitectures aim to do reads early, and writes late. Reads are on the critical path, writes can be buffered. But you can basically get into "impossible" situations where a write that was _later_ in the instruction stream than a read (on CPU2, the 'store 1 to x' would be after the load of 'y' from memory) could show up in the other order on another CPU. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 22:16 ` Linus Torvalds @ 2009-12-09 19:06 ` Alan Stern 2009-12-09 21:52 ` Linus Torvalds 0 siblings, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-09 19:06 UTC (permalink / raw) To: Linus Torvalds Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Linus Torvalds wrote: > > Wait a second. Are you saying that with code like this: > > > > if (x == 1) > > y = 5; > > > > the CPU may write to y before it has finished reading the value of x? > > And this write is visible to other CPUs, so that if x was initially 0 > > and a second CPU sets x to 1, the second CPU may see y == 5 before it > > executes the write to x (whatever that may mean)? > > Well, yes and no. CPU1 above won't release the '5' until it has confirmed > the '1' (even if it does so by reading it late). but assuming the other > CPU also does speculation, then yes, the situation you describe could > happen. If the other CPU does > > z = y; > x = 1; > > then it's certainly possible that 'z' contains 5 at the end (even if both > x and y started out zero). Because now the read of 'y' on that other CPU > might be delayed, and the write of 'x' goes ahead, CPU1 sees the 1, and > commits its write of 5, sp when CPU2 gets the cacheline, z will now > contain 5. That could be attributed to reordering on CPU2, so let's take CPU2's peculiarities out of the picture (initially everything is set to 0): CPU1 CPU2 ---- ---- if (x == 1) z = y; y = 5; mb(); x = 1; This gets at the heart of the question: Can a write move up past a control dependency? Similar questions apply to the two types of data dependency: CPU1 CPU2 ---- ---- y = x + 4; z = y; mb(); x = 1; (Initially p points to x, not y): CPU1 CPU2 ---- ---- *p = 5; z = y; mb(); p = &y; Can z end up equal to 5 in any of these examples? Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-09 19:06 ` Alan Stern @ 2009-12-09 21:52 ` Linus Torvalds 0 siblings, 0 replies; 235+ messages in thread From: Linus Torvalds @ 2009-12-09 21:52 UTC (permalink / raw) To: Alan Stern Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wed, 9 Dec 2009, Alan Stern wrote: > > That could be attributed to reordering on CPU2, so let's take CPU2's > peculiarities out of the picture (initially everything is set to 0): > > CPU1 CPU2 > ---- ---- > if (x == 1) z = y; > y = 5; mb(); > x = 1; > > This gets at the heart of the question: Can a write move up past a > control dependency? > [ .. ] > Can z end up equal to 5 in any of these examples? In any _practical_ microarchitecture I know of, the above will never result in 'z' being 5, even though CPU1 doesn't really have a memory barrier. But if I read the alpha memory ordering guarantees rigth, then at least in theory you really can end up with z=5. Let me write that as five events (with the things in brackets being what the alpha memory ordering manual calls them): - A is "read of x returns 1" on CPU1 [ P1:R(x,1) ] - B is "write of value 5 to y" on CPU1 [ P1:W(y,5) ] - C is "read of y returns 5" on CPU2 [ P2:R(y,5) ] - D is "write of value 1 to x" on CPU2 [ P2:W(x,1) ] - 'MB' is the mb() on CPU2 [ P2:MB ] (The write of 'z' is irrelevant, we can think of it as a register, the end result is the same). And yes, if I read the alpha memory ordering rules correctly, you really can end up with z=5, although I don't think you will ever find an alpha _implementation_ that does it. Why? The alpha memory ordering literally defines ordering in two ways: - "location access order". But that is _only_ defined per actual location, so while 'x' can have a location access order specified by seeing certain values, there is no "location access order" for two different memory locations (x and y). The alpha architecture manual uses "A << B" to say "event A" is before "event B" when there is a defined ordering. So in the example above, there is a location access ordering between P2:W(x,1) << P1:R(x, 1) and P2:R(y,5) << P1:W(y,5) ie you have D << A and B << C. Good so far, but that doesn't define anything else: there's only ordering between the pairs (D,A) and (B,C), nothing between them. - "Processor issue order" for two instruction is _only_ defined by either (a) memory barriers or (b) accesses to the _same_ locations. The alpha architecture manual uses "A < B" to say that "event A" is before "event B" in processor issue order. So there is a "Processor issue order" on CPU2 due to the memory barrier: P2:R(y,5) < P2:MB < P2:W(x,1), or put another way C < MB < D: C < D. Now, the question is, can we actually get the behaviour of reading 5 on CPU2 (ie P2:R(y,5)), and that is only possible if we can find an ordering that satisfies all the constraints. We have D << A B << C C < D and it seems to be that it is a possible situation: "B C D A" really does satisfy all the constraints afaik. So yes, according to the actual alpha architecture memory ordering rules, you can see '5' from that first read of 'y'. DESPITE having a mb() on CPU2. In order to not see 5, you need to also specify "A < B", and the _only_ way to do that processor issue order specification is with a memory barrier (or if the locations are the same, which they aren't). "Causality" simply is nowhere in the officially defined alpha memory ordering. The fact that we test 'x == 1' and conditionally do the write simply doesn't enter the picture. I suspect you'd have a really hard time not having causality in practice, but there _are_ things that can break causality (value prediction etc), so it's not like you'd have to actually violate physics of reality to do it. IOW, you could at least in theory implement a CPU that does every instruction speculatively in parallel, and then validates the end result afterwards according to the architecture rules. And that CPU would require the memory barrier on alpha. (On x86, 'causality' is defined to be part of the memory ordering rules, so on x86, you _do_ have a 'A < B' relationship. But not on alpha). Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 15:35 ` Linus Torvalds 2009-12-08 15:55 ` Alan Stern @ 2009-12-08 19:44 ` Rafael J. Wysocki 2009-12-08 20:16 ` Alan Stern 2009-12-08 21:04 ` Linus Torvalds 1 sibling, 2 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-08 19:44 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tuesday 08 December 2009, Linus Torvalds wrote: > > On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > > > > The wait queue plus the op_complete flag combo plays the role of the locking > > in the Linus' picture > > Please just use the lock. Don't make up your own locking crap. Really. > > Your patch is horrible. Exactly because your locking is horribly > mis-designed. You can't say things are complete from an interrupt, for > example, since you made it some random bitfield, which has unknown > characteristics (ie non-atomic read-modify-write etc). I didn't assume anyone would check it from an interrupt, because I didn't see a point. In fact I didn't assume anyone except for the PM core would check it. In case this assumption is wrong, it can be easily put under the dev->sem that we take anyway before calling the bus type (etc.) callbacks. Anyway, if we use an rwsem, it won't be checkable from interrupt context just as well. > The fact is, any time anybody makes up a new locking mechanism, THEY > ALWAYS GET IT WRONG. Don't do it. > > I suggested using the rwsem locking for a good reason. It made sense. It > was simpler. Just do it that way, stop making up crap. Suppose we use rwsem and during suspend each child uses a down_read() on a parent and then the parent uses down_write() on itself. What if, whatever the reason, the parent is a bit early and does the down_write() before one of the children has a chance to do the down_read()? Aren't we toast? Do we need any direct protection against that or does it just work itself out in a way I just don't see right now? Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 19:44 ` Rafael J. Wysocki @ 2009-12-08 20:16 ` Alan Stern 2009-12-08 20:30 ` Rafael J. Wysocki 2009-12-08 21:08 ` Linus Torvalds 2009-12-08 21:04 ` Linus Torvalds 1 sibling, 2 replies; 235+ messages in thread From: Alan Stern @ 2009-12-08 20:16 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > Suppose we use rwsem and during suspend each child uses a down_read() on a > parent and then the parent uses down_write() on itself. What if, whatever the > reason, the parent is a bit early and does the down_write() before one of the > children has a chance to do the down_read()? Aren't we toast? > > Do we need any direct protection against that or does it just work itself out > in a way I just don't see right now? That's not the way it should be done. Linus had children taking their parents' locks during suspend, which is simple but leads to difficulties. Instead, the PM core should do a down_write() on each device before starting the device's async suspend routine, and an up_write() when the routine finishes. Parents should, at the start of their async routine, do down_read() on each of their children plus whatever other devices they need to wait for. The core can do the waiting for children part and the driver's suspend routine can handle any other waiting. This is a little more awkward because it requires the parent to iterate through its children. But it does solve the off-tree dependency problem for suspends. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 20:16 ` Alan Stern @ 2009-12-08 20:30 ` Rafael J. Wysocki 2009-12-08 20:44 ` Alan Stern 2009-12-08 21:08 ` Linus Torvalds 1 sibling, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-08 20:30 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tuesday 08 December 2009, Alan Stern wrote: > On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > > > Suppose we use rwsem and during suspend each child uses a down_read() on a > > parent and then the parent uses down_write() on itself. What if, whatever the > > reason, the parent is a bit early and does the down_write() before one of the > > children has a chance to do the down_read()? Aren't we toast? > > > > Do we need any direct protection against that or does it just work itself out > > in a way I just don't see right now? > > That's not the way it should be done. Linus had children taking their > parents' locks during suspend, which is simple but leads to > difficulties. > > Instead, the PM core should do a down_write() on each device before > starting the device's async suspend routine, and an up_write() when the > routine finishes. Parents should, at the start of their async routine, > do down_read() on each of their children plus whatever other devices > they need to wait for. The core can do the waiting for children part > and the driver's suspend routine can handle any other waiting. > > This is a little more awkward because it requires the parent to iterate > through its children. I can live with that. > But it does solve the off-tree dependency problem for suspends. That's a plus, but I still think we're trying to create a barrier-alike mechanism using lock. There's one more possibility to consider, though. What if we use a completion instead of the flag + wait queue? It surely is a standard synchronization mechanism and it seems it might work here. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 20:30 ` Rafael J. Wysocki @ 2009-12-08 20:44 ` Alan Stern 2009-12-08 20:52 ` Rafael J. Wysocki 0 siblings, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-08 20:44 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > > This is a little more awkward because it requires the parent to iterate > > through its children. > > I can live with that. > > > But it does solve the off-tree dependency problem for suspends. > > That's a plus, but I still think we're trying to create a barrier-alike > mechanism using lock. > > There's one more possibility to consider, though. What if we use a completion > instead of the flag + wait queue? It surely is a standard synchronization > mechanism and it seems it might work here. You're right. I should have thought of that. Linus's original approach couldn't use a completion because during suspend it needed to make one task (the parent) wait for a bunch of others (the children). But if you iterate through the children by hand, that objection no longer applies. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 20:44 ` Alan Stern @ 2009-12-08 20:52 ` Rafael J. Wysocki 2009-12-08 21:40 ` Alan Stern 0 siblings, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-08 20:52 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tuesday 08 December 2009, Alan Stern wrote: > On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > > > > This is a little more awkward because it requires the parent to iterate > > > through its children. > > > > I can live with that. > > > > > But it does solve the off-tree dependency problem for suspends. > > > > That's a plus, but I still think we're trying to create a barrier-alike > > mechanism using lock. > > > > There's one more possibility to consider, though. What if we use a completion > > instead of the flag + wait queue? It surely is a standard synchronization > > mechanism and it seems it might work here. > > You're right. I should have thought of that. Linus's original > approach couldn't use a completion because during suspend it needed to > make one task (the parent) wait for a bunch of others (the children). > But if you iterate through the children by hand, that objection no > longer applies. BTW, is there a good reason why completion_done() doesn't use spin_lock_irqsave and spin_unlock_irqrestore? complete() and complete_all() use them, so why not here? Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 20:52 ` Rafael J. Wysocki @ 2009-12-08 21:40 ` Alan Stern 2009-12-08 21:48 ` spinlock in completion_done() (was: Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33)) Rafael J. Wysocki 2009-12-08 22:18 ` Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) Linus Torvalds 0 siblings, 2 replies; 235+ messages in thread From: Alan Stern @ 2009-12-08 21:40 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > BTW, is there a good reason why completion_done() doesn't use spin_lock_irqsave > and spin_unlock_irqrestore? complete() and complete_all() use them, so why not > here? And likewise in try_wait_for_completion(). It looks like a bug. Maybe these routines were not intended to be called with interrupts disabled, but that requirement doesn't seem to be documented. And it isn't a natural requirement anyway. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* spinlock in completion_done() (was: Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33)) 2009-12-08 21:40 ` Alan Stern @ 2009-12-08 21:48 ` Rafael J. Wysocki 2009-12-09 9:29 ` Ingo Molnar 2009-12-08 22:18 ` Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) Linus Torvalds 1 sibling, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-08 21:48 UTC (permalink / raw) To: Alan Stern, Ingo Molnar Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tuesday 08 December 2009, Alan Stern wrote: > On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > > > BTW, is there a good reason why completion_done() doesn't use spin_lock_irqsave > > and spin_unlock_irqrestore? complete() and complete_all() use them, so why not > > here? > > And likewise in try_wait_for_completion(). It looks like a bug. Maybe > these routines were not intended to be called with interrupts disabled, > but that requirement doesn't seem to be documented. And it isn't a > natural requirement anyway. OK, let's ask Ingo about that. Ingo, is there any particular reason why completion_done() and try_wait_for_completion() don't use spin_lock_irqsave() and spin_unlock_irqrestore()? Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: spinlock in completion_done() (was: Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33)) 2009-12-08 21:48 ` spinlock in completion_done() (was: Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33)) Rafael J. Wysocki @ 2009-12-09 9:29 ` Ingo Molnar 2009-12-09 22:37 ` Rafael J. Wysocki 0 siblings, 1 reply; 235+ messages in thread From: Ingo Molnar @ 2009-12-09 9:29 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alan Stern, Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list, Peter Zijlstra, David Chinner, Lachlan McIlroy * Rafael J. Wysocki <rjw@sisk.pl> wrote: > On Tuesday 08 December 2009, Alan Stern wrote: > > On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > > > > > BTW, is there a good reason why completion_done() doesn't use spin_lock_irqsave > > > and spin_unlock_irqrestore? complete() and complete_all() use them, so why not > > > here? > > > > And likewise in try_wait_for_completion(). It looks like a bug. Maybe > > these routines were not intended to be called with interrupts disabled, > > but that requirement doesn't seem to be documented. And it isn't a > > natural requirement anyway. > > OK, let's ask Ingo about that. > > Ingo, is there any particular reason why completion_done() and > try_wait_for_completion() don't use spin_lock_irqsave() and > spin_unlock_irqrestore()? that's a bug that should be fixed - all the wakeup side (and atomic) variants of completetion API should be irq safe. It appears that these new completion APIs were added via the XFS tree about a year ago: 39d2f1a: [XFS] extend completions to provide XFS object flush requirements Please Cc: scheduler folks to all scheduler patches. Ingo ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: spinlock in completion_done() (was: Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33)) 2009-12-09 9:29 ` Ingo Molnar @ 2009-12-09 22:37 ` Rafael J. Wysocki 2009-12-10 7:59 ` Ingo Molnar 0 siblings, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-09 22:37 UTC (permalink / raw) To: Ingo Molnar Cc: Alan Stern, Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list, Peter Zijlstra, David Chinner, Lachlan McIlroy On Wednesday 09 December 2009, Ingo Molnar wrote: > > * Rafael J. Wysocki <rjw@sisk.pl> wrote: > > > On Tuesday 08 December 2009, Alan Stern wrote: > > > On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > > > > > > > BTW, is there a good reason why completion_done() doesn't use spin_lock_irqsave > > > > and spin_unlock_irqrestore? complete() and complete_all() use them, so why not > > > > here? > > > > > > And likewise in try_wait_for_completion(). It looks like a bug. Maybe > > > these routines were not intended to be called with interrupts disabled, > > > but that requirement doesn't seem to be documented. And it isn't a > > > natural requirement anyway. > > > > OK, let's ask Ingo about that. > > > > Ingo, is there any particular reason why completion_done() and > > try_wait_for_completion() don't use spin_lock_irqsave() and > > spin_unlock_irqrestore()? > > that's a bug that should be fixed - all the wakeup side (and atomic) > variants of completetion API should be irq safe. > > It appears that these new completion APIs were added via the XFS tree > about a year ago: > > 39d2f1a: [XFS] extend completions to provide XFS object flush requirements > > Please Cc: scheduler folks to all scheduler patches. If you haven't fixed it locally yet, would you mind me posting a fix? Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: spinlock in completion_done() (was: Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33)) 2009-12-09 22:37 ` Rafael J. Wysocki @ 2009-12-10 7:59 ` Ingo Molnar 2009-12-11 4:10 ` Dave Chinner 2009-12-12 23:07 ` [PATCH] sched: Make wakeup side variants of completion API irq safe (was: Re: spinlock in completion_done()) Rafael J. Wysocki 0 siblings, 2 replies; 235+ messages in thread From: Ingo Molnar @ 2009-12-10 7:59 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alan Stern, Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list, Peter Zijlstra, David Chinner, Lachlan McIlroy * Rafael J. Wysocki <rjw@sisk.pl> wrote: > On Wednesday 09 December 2009, Ingo Molnar wrote: > > > > * Rafael J. Wysocki <rjw@sisk.pl> wrote: > > > > > On Tuesday 08 December 2009, Alan Stern wrote: > > > > On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > > > > > > > > > BTW, is there a good reason why completion_done() doesn't use spin_lock_irqsave > > > > > and spin_unlock_irqrestore? complete() and complete_all() use them, so why not > > > > > here? > > > > > > > > And likewise in try_wait_for_completion(). It looks like a bug. Maybe > > > > these routines were not intended to be called with interrupts disabled, > > > > but that requirement doesn't seem to be documented. And it isn't a > > > > natural requirement anyway. > > > > > > OK, let's ask Ingo about that. > > > > > > Ingo, is there any particular reason why completion_done() and > > > try_wait_for_completion() don't use spin_lock_irqsave() and > > > spin_unlock_irqrestore()? > > > > that's a bug that should be fixed - all the wakeup side (and atomic) > > variants of completetion API should be irq safe. > > > > It appears that these new completion APIs were added via the XFS tree > > about a year ago: > > > > 39d2f1a: [XFS] extend completions to provide XFS object flush requirements > > > > Please Cc: scheduler folks to all scheduler patches. > > If you haven't fixed it locally yet, would you mind me posting a fix? I wouldnt mind it at all. Thanks, Ingo ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: spinlock in completion_done() (was: Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33)) 2009-12-10 7:59 ` Ingo Molnar @ 2009-12-11 4:10 ` Dave Chinner 2009-12-11 7:54 ` Ingo Molnar 2009-12-12 23:07 ` [PATCH] sched: Make wakeup side variants of completion API irq safe (was: Re: spinlock in completion_done()) Rafael J. Wysocki 1 sibling, 1 reply; 235+ messages in thread From: Dave Chinner @ 2009-12-11 4:10 UTC (permalink / raw) To: Ingo Molnar Cc: Rafael J. Wysocki, Alan Stern, Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list, Peter Zijlstra, Lachlan McIlroy On Thu, Dec 10, 2009 at 08:59:47AM +0100, Ingo Molnar wrote: > * Rafael J. Wysocki <rjw@sisk.pl> wrote: > > On Wednesday 09 December 2009, Ingo Molnar wrote: > > > * Rafael J. Wysocki <rjw@sisk.pl> wrote: > > > > On Tuesday 08 December 2009, Alan Stern wrote: > > > > > On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > > > > > > > > > > > BTW, is there a good reason why completion_done() doesn't use spin_lock_irqsave > > > > > > and spin_unlock_irqrestore? complete() and complete_all() use them, so why not > > > > > > here? > > > > > > > > > > And likewise in try_wait_for_completion(). It looks like a bug. Maybe > > > > > these routines were not intended to be called with interrupts disabled, > > > > > but that requirement doesn't seem to be documented. And it isn't a > > > > > natural requirement anyway. When I implemented them they were not called from anywhere that disabled interrupts. IIRC the main reason I used spin_lock_irq() was because that is what wait_for_completion() used at the time.... > > > that's a bug that should be fixed - all the wakeup side (and atomic) > > > variants of completetion API should be irq safe. I see no problems with that ;) Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: spinlock in completion_done() (was: Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33)) 2009-12-11 4:10 ` Dave Chinner @ 2009-12-11 7:54 ` Ingo Molnar 0 siblings, 0 replies; 235+ messages in thread From: Ingo Molnar @ 2009-12-11 7:54 UTC (permalink / raw) To: Dave Chinner Cc: Rafael J. Wysocki, Alan Stern, Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list, Peter Zijlstra, Lachlan McIlroy * Dave Chinner <david@fromorbit.com> wrote: > On Thu, Dec 10, 2009 at 08:59:47AM +0100, Ingo Molnar wrote: > > * Rafael J. Wysocki <rjw@sisk.pl> wrote: > > > On Wednesday 09 December 2009, Ingo Molnar wrote: > > > > * Rafael J. Wysocki <rjw@sisk.pl> wrote: > > > > > On Tuesday 08 December 2009, Alan Stern wrote: > > > > > > On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > > > > > > > > > > > > > BTW, is there a good reason why completion_done() doesn't use spin_lock_irqsave > > > > > > > and spin_unlock_irqrestore? complete() and complete_all() use them, so why not > > > > > > > here? > > > > > > > > > > > > And likewise in try_wait_for_completion(). It looks like a bug. Maybe > > > > > > these routines were not intended to be called with interrupts disabled, > > > > > > but that requirement doesn't seem to be documented. And it isn't a > > > > > > natural requirement anyway. > > When I implemented them they were not called from anywhere that > disabled interrupts. IIRC the main reason I used spin_lock_irq() > was because that is what wait_for_completion() used at the time.... Obviously wait_for_competion() as a non-atomic API that can block will (and should) use _irq() - but atomic variants (complete, but also the try-wait thing) use irqsafe methods. A fair portion of completions happen in IRQ context. Ingo ^ permalink raw reply [flat|nested] 235+ messages in thread
* [PATCH] sched: Make wakeup side variants of completion API irq safe (was: Re: spinlock in completion_done()) 2009-12-10 7:59 ` Ingo Molnar 2009-12-11 4:10 ` Dave Chinner @ 2009-12-12 23:07 ` Rafael J. Wysocki 2009-12-13 7:36 ` [tip:sched/urgent] sched: Make wakeup side and atomic variants of completion API irq safe tip-bot for Rafael J. Wysocki 1 sibling, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-12 23:07 UTC (permalink / raw) To: Ingo Molnar Cc: Alan Stern, Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list, Peter Zijlstra, David Chinner, Lachlan McIlroy On Thursday 10 December 2009, Ingo Molnar wrote: > > * Rafael J. Wysocki <rjw@sisk.pl> wrote: > > > On Wednesday 09 December 2009, Ingo Molnar wrote: > > > > > > * Rafael J. Wysocki <rjw@sisk.pl> wrote: > > > > > > > On Tuesday 08 December 2009, Alan Stern wrote: > > > > > On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > > > > > > > > > > > BTW, is there a good reason why completion_done() doesn't use spin_lock_irqsave > > > > > > and spin_unlock_irqrestore? complete() and complete_all() use them, so why not > > > > > > here? > > > > > > > > > > And likewise in try_wait_for_completion(). It looks like a bug. Maybe > > > > > these routines were not intended to be called with interrupts disabled, > > > > > but that requirement doesn't seem to be documented. And it isn't a > > > > > natural requirement anyway. > > > > > > > > OK, let's ask Ingo about that. > > > > > > > > Ingo, is there any particular reason why completion_done() and > > > > try_wait_for_completion() don't use spin_lock_irqsave() and > > > > spin_unlock_irqrestore()? > > > > > > that's a bug that should be fixed - all the wakeup side (and atomic) > > > variants of completetion API should be irq safe. > > > > > > It appears that these new completion APIs were added via the XFS tree > > > about a year ago: > > > > > > 39d2f1a: [XFS] extend completions to provide XFS object flush requirements > > > > > > Please Cc: scheduler folks to all scheduler patches. > > > > If you haven't fixed it locally yet, would you mind me posting a fix? > > I wouldnt mind it at all. Is appended. Thanks, Rafael --- From: Rafael J. Wysocki <rjw@sisk.pl> Subject: sched: Make wakeup side variants of completion API irq safe All the wakeup side variants of the completion API shoild be irq safe, but completion_done() and try_wait_for_completion() aren't. Fix the problem by making them use spin_lock_irqsave() and spin_lock_irqrestore(). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> --- kernel/sched.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) Index: linux-2.6/kernel/sched.c =================================================================== --- linux-2.6.orig/kernel/sched.c +++ linux-2.6/kernel/sched.c @@ -5931,14 +5931,15 @@ EXPORT_SYMBOL(wait_for_completion_killab */ bool try_wait_for_completion(struct completion *x) { + unsigned long flags; int ret = 1; - spin_lock_irq(&x->wait.lock); + spin_lock_irqsave(&x->wait.lock, flags); if (!x->done) ret = 0; else x->done--; - spin_unlock_irq(&x->wait.lock); + spin_unlock_irqrestore(&x->wait.lock, flags); return ret; } EXPORT_SYMBOL(try_wait_for_completion); @@ -5953,12 +5954,13 @@ EXPORT_SYMBOL(try_wait_for_completion); */ bool completion_done(struct completion *x) { + unsigned long flags; int ret = 1; - spin_lock_irq(&x->wait.lock); + spin_lock_irqsave(&x->wait.lock, flags); if (!x->done) ret = 0; - spin_unlock_irq(&x->wait.lock); + spin_unlock_irqrestore(&x->wait.lock, flags); return ret; } EXPORT_SYMBOL(completion_done); ^ permalink raw reply [flat|nested] 235+ messages in thread
* [tip:sched/urgent] sched: Make wakeup side and atomic variants of completion API irq safe 2009-12-12 23:07 ` [PATCH] sched: Make wakeup side variants of completion API irq safe (was: Re: spinlock in completion_done()) Rafael J. Wysocki @ 2009-12-13 7:36 ` tip-bot for Rafael J. Wysocki 0 siblings, 0 replies; 235+ messages in thread From: tip-bot for Rafael J. Wysocki @ 2009-12-13 7:36 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, linux-pm, hpa, mingo, stern, a.p.zijlstra, torvalds, rui.zhang, lachlan, david, tglx, rjw, mingo Commit-ID: 7539a3b3d1f892dd97eaf094134d7de55c13befe Gitweb: http://git.kernel.org/tip/7539a3b3d1f892dd97eaf094134d7de55c13befe Author: Rafael J. Wysocki <rjw@sisk.pl> AuthorDate: Sun, 13 Dec 2009 00:07:30 +0100 Committer: Ingo Molnar <mingo@elte.hu> CommitDate: Sun, 13 Dec 2009 08:12:46 +0100 sched: Make wakeup side and atomic variants of completion API irq safe Alan Stern noticed that all the wakeup side (and atomic) variants of the completion APIs should be irq safe, but the newly introduced completion_done() and try_wait_for_completion() aren't. The use of the irq unsafe variants in IRQ contexts can cause crashes/hangs. Fix the problem by making them use spin_lock_irqsave() and spin_lock_irqrestore(). Reported-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Zhang Rui <rui.zhang@intel.com> Cc: pm list <linux-pm@lists.linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: David Chinner <david@fromorbit.com> Cc: Lachlan McIlroy <lachlan@sgi.com> LKML-Reference: <200912130007.30541.rjw@sisk.pl> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/sched.c | 10 ++++++---- 1 files changed, 6 insertions(+), 4 deletions(-) diff --git a/kernel/sched.c b/kernel/sched.c index ff39cad..8b3532f 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -5908,14 +5908,15 @@ EXPORT_SYMBOL(wait_for_completion_killable); */ bool try_wait_for_completion(struct completion *x) { + unsigned long flags; int ret = 1; - spin_lock_irq(&x->wait.lock); + spin_lock_irqsave(&x->wait.lock, flags); if (!x->done) ret = 0; else x->done--; - spin_unlock_irq(&x->wait.lock); + spin_unlock_irqrestore(&x->wait.lock, flags); return ret; } EXPORT_SYMBOL(try_wait_for_completion); @@ -5930,12 +5931,13 @@ EXPORT_SYMBOL(try_wait_for_completion); */ bool completion_done(struct completion *x) { + unsigned long flags; int ret = 1; - spin_lock_irq(&x->wait.lock); + spin_lock_irqsave(&x->wait.lock, flags); if (!x->done) ret = 0; - spin_unlock_irq(&x->wait.lock); + spin_unlock_irqrestore(&x->wait.lock, flags); return ret; } EXPORT_SYMBOL(completion_done); ^ permalink raw reply related [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 21:40 ` Alan Stern 2009-12-08 21:48 ` spinlock in completion_done() (was: Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33)) Rafael J. Wysocki @ 2009-12-08 22:18 ` Linus Torvalds 2009-12-09 2:11 ` Alan Stern 1 sibling, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-08 22:18 UTC (permalink / raw) To: Alan Stern Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Alan Stern wrote: > > And likewise in try_wait_for_completion(). It looks like a bug. Maybe > these routines were not intended to be called with interrupts disabled, > but that requirement doesn't seem to be documented. And it isn't a > natural requirement anyway. 'complete()' is supposed to be callable from interrupts, but the waiting ones aren't. But 'complete()' is all you should need to call from interrupts, so that's fine. So I think completions should work, if done right. That whole "make the parent wait for all the children to complete" is fine in that sense. And I'll happily take such an approach if my rwlock thing doesn't work. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 22:18 ` Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) Linus Torvalds @ 2009-12-09 2:11 ` Alan Stern 0 siblings, 0 replies; 235+ messages in thread From: Alan Stern @ 2009-12-09 2:11 UTC (permalink / raw) To: Linus Torvalds Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Linus Torvalds wrote: > On Tue, 8 Dec 2009, Alan Stern wrote: > > > > And likewise in try_wait_for_completion(). It looks like a bug. Maybe > > these routines were not intended to be called with interrupts disabled, > > but that requirement doesn't seem to be documented. And it isn't a > > natural requirement anyway. > > 'complete()' is supposed to be callable from interrupts, but the waiting > ones aren't. But 'complete()' is all you should need to call from > interrupts, so that's fine. And try_wait_for_completion()? The fact that it doesn't block makes it interrupt-safe. What's the point of having an interrupt-safe routine that you can't call from within interrupt handlers? Even if nobody uses it that way now, there's no guarantee somebody won't attempt it in the future. > So I think completions should work, if done right. That whole "make the > parent wait for all the children to complete" is fine in that sense. And > I'll happily take such an approach if my rwlock thing doesn't work. In principle the two approaches could be combined: Add an rwsem for use by children and a completion for off-tree[*] use. But that would certainly be overkill. Looping over children doesn't take a tremendous amount of time compared to a full system suspend. Alan Stern [*] "Off-tree" isn't really an appropriate term; these devices aren't "off" the tree. "Non-tree" would be better. ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 20:16 ` Alan Stern 2009-12-08 20:30 ` Rafael J. Wysocki @ 2009-12-08 21:08 ` Linus Torvalds 2009-12-08 21:13 ` Linus Torvalds 2009-12-08 22:07 ` Alan Stern 1 sibling, 2 replies; 235+ messages in thread From: Linus Torvalds @ 2009-12-08 21:08 UTC (permalink / raw) To: Alan Stern Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Alan Stern wrote: > > That's not the way it should be done. Linus had children taking their > parents' locks during suspend, which is simple but leads to > difficulties. No it doesn't. Name them. > Instead, the PM core should do a down_write() on each device before > starting the device's async suspend routine, and an up_write() when the > routine finishes. No you should NOT do that. If you do that, you serialize the suspend incorrectly and much too early. IOW, think a topology like this: a -> b -> c \ > d -> e where you'd want to suspend 'c' and 'e' asynchronously. If we do a 'down-write()' on b, then we'll delay until 'c' has suspended, an if we have ordered the nodes in the obvious depth-first order, we'll walk the PM device list in the order: c b e d a and now we'll serialize on 'b', waiting for 'c' to suspend. Which we do _not_ want to do, because the whole point was to suspend 'c' and 'e' together. > Parents should, at the start of their async routine, > do down_read() on each of their children plus whatever other devices > they need to wait for. The core can do the waiting for children part > and the driver's suspend routine can handle any other waiting. Why? That just complicates things. Compare to my simple locking scheme I've quoted several times. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 21:08 ` Linus Torvalds @ 2009-12-08 21:13 ` Linus Torvalds 2009-12-08 22:07 ` Alan Stern 1 sibling, 0 replies; 235+ messages in thread From: Linus Torvalds @ 2009-12-08 21:13 UTC (permalink / raw) To: Alan Stern Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Linus Torvalds wrote: > On Tue, 8 Dec 2009, Alan Stern wrote: > > > > That's not the way it should be done. Linus had children taking their > > parents' locks during suspend, which is simple but leads to > > difficulties. > > No it doesn't. Name them. Really. Let me put this simply: I've told you guys how to do it simply, with _zero_ crap. No "iterating over children". No games. No data structures. No new infrastructure. Just a single new rwlock per device, and _trivial_ code. So here's the challenge: try it my simple way first. I've quoted the code about five million times already. If you _actually_ see some problems, explain them. Don't make up stupid "iterate over each child" things. Don't claim totally made-up "leads to difficulties". Don't make it any more complicated than it needs to be. Keep it simple. And once you have tried that simple approach, and you really can show why it doesn't work, THEN you can try something else. But before you try the simple approach and explain why it wouldn't work, I simply will not pull anything more complex. Understood and agreed? Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 21:08 ` Linus Torvalds 2009-12-08 21:13 ` Linus Torvalds @ 2009-12-08 22:07 ` Alan Stern 2009-12-08 22:30 ` Rafael J. Wysocki 2009-12-08 22:32 ` Linus Torvalds 1 sibling, 2 replies; 235+ messages in thread From: Alan Stern @ 2009-12-08 22:07 UTC (permalink / raw) To: Linus Torvalds Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Linus Torvalds wrote: > On Tue, 8 Dec 2009, Alan Stern wrote: > > > > That's not the way it should be done. Linus had children taking their > > parents' locks during suspend, which is simple but leads to > > difficulties. > > No it doesn't. Name them. Well, one difficulty. It arises only because we are contemplating having the PM core fire up the async tasks, rather than having the drivers' suspend routines launch them (the way your original proposal did -- the difficulty does not arise there). Suppose A and B are unrelated devices and we need to impose the off-tree constraint that A suspends after B. With children taking their parent's lock, the way to prevent A from suspending too soon is by having B's suspend routine acquire A's lock. But B's suspend routine runs entirely in an async task, because that task is started by the PM core and it does the method call. Hence by the time B's suspend routine is called, A may already have begun suspending -- it's too late to take A's lock. To make the locking work, B would have to acquire A's lock _before_ B's async task starts. Since the PM core is unaware of the off-tree dependency, there's no simple way to make it work. > > Instead, the PM core should do a down_write() on each device before > > starting the device's async suspend routine, and an up_write() when the > > routine finishes. > > No you should NOT do that. If you do that, you serialize the suspend > incorrectly and much too early. IOW, think a topology like this: > > a -> b -> c > \ > > d -> e > > where you'd want to suspend 'c' and 'e' asynchronously. If we do a > 'down-write()' on b, then we'll delay until 'c' has suspended, an if we > have ordered the nodes in the obvious depth-first order, we'll walk the PM > device list in the order: > > c b e d a > > and now we'll serialize on 'b', waiting for 'c' to suspend. Which we do > _not_ want to do, because the whole point was to suspend 'c' and 'e' > together. You misunderstand. The suspend algorithm will look like this: dpm_suspend() { list_for_each_entry_reverse(dpm_list, dev) { down_write(dev->lock); async_schedule(device_suspend, dev); } } device_suspend(dev) { device_for_each_child(dev, child) { down_read(child->lock); up_read(child->lock); } dev->suspend(dev); /* May do off-tree down+up pairs */ up_write(dev->lock); } With completions instead of rwsems, the down_write() changes to init_completion(), the up_write() changes to complete_all(), and the down_read()+up_read() pairs change to wait_for_completion(). So 'b' will wait for 'c' to suspend, as it must, but 'e' won't wait for anything. > > Parents should, at the start of their async routine, > > do down_read() on each of their children plus whatever other devices > > they need to wait for. The core can do the waiting for children part > > and the driver's suspend routine can handle any other waiting. > > Why? > > That just complicates things. Compare to my simple locking scheme I've > quoted several times. It is a little more complicated in that it involves explicitly iterating over children. But it is simpler in that it can use completions instead of rwsems and it avoids the off-tree dependency problem described above. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 22:07 ` Alan Stern @ 2009-12-08 22:30 ` Rafael J. Wysocki 2009-12-09 2:23 ` Alan Stern 2009-12-08 22:32 ` Linus Torvalds 1 sibling, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-08 22:30 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tuesday 08 December 2009, Alan Stern wrote: > On Tue, 8 Dec 2009, Linus Torvalds wrote: > > > On Tue, 8 Dec 2009, Alan Stern wrote: > > > > > > That's not the way it should be done. Linus had children taking their > > > parents' locks during suspend, which is simple but leads to > > > difficulties. > > > > No it doesn't. Name them. > > Well, one difficulty. It arises only because we are contemplating > having the PM core fire up the async tasks, rather than having the > drivers' suspend routines launch them (the way your original proposal > did -- the difficulty does not arise there). > > Suppose A and B are unrelated devices and we need to impose the > off-tree constraint that A suspends after B. With children taking > their parent's lock, the way to prevent A from suspending too soon is > by having B's suspend routine acquire A's lock. > > But B's suspend routine runs entirely in an async task, because that > task is started by the PM core and it does the method call. Hence by > the time B's suspend routine is called, A may already have begun > suspending -- it's too late to take A's lock. To make the locking > work, B would have to acquire A's lock _before_ B's async task starts. > Since the PM core is unaware of the off-tree dependency, there's no > simple way to make it work. Do not set async_suspend for B and instead start your own async thread from its suspend callback. The parent-children synchronization is done by the core anyway (at least I'd do it that way), so the only thing you need to worry about is the extra dependency. > > That just complicates things. Compare to my simple locking scheme I've > > quoted several times. > > It is a little more complicated in that it involves explicitly > iterating over children. But it is simpler in that it can use > completions instead of rwsems and it avoids the off-tree dependency > problem described above. I would be slightly more comfortable using completions, but the rwsem-based approach is fine with me as well. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 22:30 ` Rafael J. Wysocki @ 2009-12-09 2:23 ` Alan Stern 2009-12-09 21:56 ` Rafael J. Wysocki 0 siblings, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-09 2:23 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > > Well, one difficulty. It arises only because we are contemplating > > having the PM core fire up the async tasks, rather than having the > > drivers' suspend routines launch them (the way your original proposal > > did -- the difficulty does not arise there). > > > > Suppose A and B are unrelated devices and we need to impose the > > off-tree constraint that A suspends after B. With children taking > > their parent's lock, the way to prevent A from suspending too soon is > > by having B's suspend routine acquire A's lock. > > > > But B's suspend routine runs entirely in an async task, because that > > task is started by the PM core and it does the method call. Hence by > > the time B's suspend routine is called, A may already have begun > > suspending -- it's too late to take A's lock. To make the locking > > work, B would have to acquire A's lock _before_ B's async task starts. > > Since the PM core is unaware of the off-tree dependency, there's no > > simple way to make it work. > > Do not set async_suspend for B and instead start your own async thread > from its suspend callback. The parent-children synchronization is done by the > core anyway (at least I'd do it that way), so the only thing you need to worry > about is the extra dependency. I don't like that because it introduces "artificial" dependencies: It makes B depend on all the preceding synchronous suspends, even totally unrelated ones. But yes, it would work. > I would be slightly more comfortable using completions, but the rwsem-based > approach is fine with me as well. On the principle of making things as easy and foolproof as possible for driver authors, I also favor completions since it makes dealing with non-tree dependencies easier. However either way would be okay. I do have to handle some non-tree dependencies in USB, but oddly enough they affect only resume, not suspend. So this "who starts the async task" issue doesn't apply. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-09 2:23 ` Alan Stern @ 2009-12-09 21:56 ` Rafael J. Wysocki 2009-12-09 22:27 ` Alan Stern 0 siblings, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-09 21:56 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wednesday 09 December 2009, Alan Stern wrote: > On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > > > > Well, one difficulty. It arises only because we are contemplating > > > having the PM core fire up the async tasks, rather than having the > > > drivers' suspend routines launch them (the way your original proposal > > > did -- the difficulty does not arise there). > > > > > > Suppose A and B are unrelated devices and we need to impose the > > > off-tree constraint that A suspends after B. With children taking > > > their parent's lock, the way to prevent A from suspending too soon is > > > by having B's suspend routine acquire A's lock. > > > > > > But B's suspend routine runs entirely in an async task, because that > > > task is started by the PM core and it does the method call. Hence by > > > the time B's suspend routine is called, A may already have begun > > > suspending -- it's too late to take A's lock. To make the locking > > > work, B would have to acquire A's lock _before_ B's async task starts. > > > Since the PM core is unaware of the off-tree dependency, there's no > > > simple way to make it work. > > > > Do not set async_suspend for B and instead start your own async thread > > from its suspend callback. The parent-children synchronization is done by the > > core anyway (at least I'd do it that way), so the only thing you need to worry > > about is the extra dependency. > > I don't like that because it introduces "artificial" dependencies: It > makes B depend on all the preceding synchronous suspends, even totally > unrelated ones. But yes, it would work. Well, unfortunately, it wouldn't, because (at least in the context of my last patch) the core would release the rwsems as soon as your suspend had returned. So you'd have to make your suspend wait for the async thread and that would make it pointless. So scratch that, it wasn't a good idea at all. This leaves us with basically two options, where the first one is to use rwsems in a way that you've proposed (with iterating over children), and the second one is to use completions. In my opinion rwsems don't give us any advantage in this case, so I'd very much prefer to use completions. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-09 21:56 ` Rafael J. Wysocki @ 2009-12-09 22:27 ` Alan Stern 0 siblings, 0 replies; 235+ messages in thread From: Alan Stern @ 2009-12-09 22:27 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wed, 9 Dec 2009, Rafael J. Wysocki wrote: > > I don't like that because it introduces "artificial" dependencies: It > > makes B depend on all the preceding synchronous suspends, even totally > > unrelated ones. But yes, it would work. > > Well, unfortunately, it wouldn't, because (at least in the context of my last > patch) the core would release the rwsems as soon as your suspend had > returned. So you'd have to make your suspend wait for the async thread and > that would make it pointless. So scratch that, it wasn't a good idea at all. > > This leaves us with basically two options, where the first one is to use > rwsems in a way that you've proposed (with iterating over children), and the > second one is to use completions. In my opinion rwsems don't give us any > advantage in this case, so I'd very much prefer to use completions. If you really want to add support for async suspend constraints, then completions are clearer than rwsems. If you don't care (and it's unlikely that anyone will need them in the near future) then you might as well stick with the current rwsem implementation and avoid iterating over children. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 22:07 ` Alan Stern 2009-12-08 22:30 ` Rafael J. Wysocki @ 2009-12-08 22:32 ` Linus Torvalds 2009-12-09 2:35 ` Alan Stern 1 sibling, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-08 22:32 UTC (permalink / raw) To: Alan Stern Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Alan Stern wrote: > > Suppose A and B are unrelated devices and we need to impose the > off-tree constraint that A suspends after B. Ah. Ok, I can imagine the off-tree constraints, but part of my "keep it simple" was to simply not do them. If there are constraints that aren't in the topology of the tree, then I simply don't think that async is worth it in the first place. > You misunderstand. The suspend algorithm will look like this: > > dpm_suspend() > { > list_for_each_entry_reverse(dpm_list, dev) { > down_write(dev->lock); > async_schedule(device_suspend, dev); > } > } > > device_suspend(dev) > { > device_for_each_child(dev, child) { > down_read(child->lock); > up_read(child->lock); > } > dev->suspend(dev); /* May do off-tree down+up pairs */ > up_write(dev->lock); > } Ok, so the above I think work (and see my previous email: I think completions would be workable there too). It's just that I think the "looping over children" is ugly, when I think that by doing it the other way around you can make the code simpler and only depend on the PM device list and a simple parent pointer access. I also think that you are wrong that the above somehow protects against non-topological dependencies. If the device you want to keep delay yourself suspending for is after you in the list, the down_read() on that may succeed simply because it hasn't even done its down_write() yet and you got scheduled early. But I guess you could do that by walking the list twice (first to lock them all, then to actually call the suspend function). That whole two-phase thing, except the first phase _only_ locks, and doesn't do any callbacks. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 22:32 ` Linus Torvalds @ 2009-12-09 2:35 ` Alan Stern 2009-12-09 2:54 ` Linus Torvalds 2009-12-09 13:38 ` Mark Brown 0 siblings, 2 replies; 235+ messages in thread From: Alan Stern @ 2009-12-09 2:35 UTC (permalink / raw) To: Linus Torvalds Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Linus Torvalds wrote: > It's just that I think the "looping over children" is ugly, when I think > that by doing it the other way around you can make the code simpler and > only depend on the PM device list and a simple parent pointer access. I agree that it is uglier. The only advantage is in handling asynchronous non-tree suspend dependencies, of which we probably won't have very many. In fact, I don't know of _any_ offhand. Interestingly, this non-tree dependency problem does not affect resume. > I also think that you are wrong that the above somehow protects against > non-topological dependencies. If the device you want to keep delay > yourself suspending for is after you in the list, the down_read() on that > may succeed simply because it hasn't even done its down_write() yet and > you got scheduled early. You mean, if A comes before B in the list and A must suspend after B? Then A's down_read() on B _can't_ occur before B's down_write() on itself. The down_write() on B happens before the list_for_each_entry_reverse() iteration reaches A; it even happens before B's async task is launched. > But I guess you could do that by walking the list twice (first to lock > them all, then to actually call the suspend function). That whole > two-phase thing, except the first phase _only_ locks, and doesn't do any > callbacks. Not necessary. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-09 2:35 ` Alan Stern @ 2009-12-09 2:54 ` Linus Torvalds 2009-12-09 15:24 ` Alan Stern 2009-12-09 13:38 ` Mark Brown 1 sibling, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-09 2:54 UTC (permalink / raw) To: Alan Stern Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Alan Stern wrote: > > You mean, if A comes before B in the list and A must suspend after B? But if they are not topologically ordered, then A wouldn't necessarily be before B on the list in the first place. Of course, if we've mucked with the list by hand and made sure the ordering is ok, then that's a different issue. But your whole point seemed to be that the device could impose its own ordering in its suspend callback, which is not true on its own without external ordering. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-09 2:54 ` Linus Torvalds @ 2009-12-09 15:24 ` Alan Stern 2009-12-09 15:38 ` Linus Torvalds 0 siblings, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-09 15:24 UTC (permalink / raw) To: Linus Torvalds Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Linus Torvalds wrote: > On Tue, 8 Dec 2009, Alan Stern wrote: > > > > You mean, if A comes before B in the list and A must suspend after B? > > But if they are not topologically ordered, then A wouldn't necessarily be > before B on the list in the first place. Okay, I see what you're getting at. Yes, this is quite true -- if A doesn't precede B in dpm_list then A can't safely wait for B to suspend. To put it another way, only list-compatible constraints are feasible. This shouldn't be a problem. If it were we'd be seeing it right now, because A would _always_ suspend before B. > Of course, if we've mucked with the list by hand and made sure the > ordering is ok, then that's a different issue. But your whole point seemed > to be that the device could impose its own ordering in its suspend > callback, which is not true on its own without external ordering. No, sorry for not making it clearer. I was assuming all long that the non-tree constraints were compatible with the list ordering. In fact these considerations already affect the USB resume operations, even without asynchronous resume. The code relies on the fact that the PCI layer registers sibling devices on a slot in order of increasing function number. There's no guarantee this will remain true in the future (it may already be wrong AFAIK), so putting in some explicit list manipulation is the prudent thing to do. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-09 15:24 ` Alan Stern @ 2009-12-09 15:38 ` Linus Torvalds 2009-12-09 15:57 ` Alan Stern 0 siblings, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-09 15:38 UTC (permalink / raw) To: Alan Stern Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wed, 9 Dec 2009, Alan Stern wrote: > > In fact these considerations already affect the USB resume operations, > even without asynchronous resume. The code relies on the fact that the > PCI layer registers sibling devices on a slot in order of increasing > function number. There's no guarantee this will remain true in the > future (it may already be wrong AFAIK), so putting in some explicit > list manipulation is the prudent thing to do. I do think we want to keep the slot ordering. One of the silent issues that the device management code has always had is the whole notion of naming stability. Now, udev and various fancy naming schemes solve that at a higher level, but it is still the case that we _really_ want basic things like your PCI controllers to show up in stable order. For example, it is _very_ inconvenient if things like PCI probing ends up allocating different bus numbers (or resource allocations) across reboots even if the hardware hasn't been changed. Just from a debuggability standpoint, that just ends up being a total disaster. For example, we continually hit odd special cases where PCI resource allocation has some unexplained problem because there is some motherboard resource that is hidden and invisible to our allocator. They are rare in the sense that it's usually just a couple of odd laptops or something, but they are not rare in the sense that pretty much _every_ single time we change some resource allocation logic, we find one or two machines that have some issue. Things like that would be total disasters if the core device layer then ended up also not having well-defined ordering. This is why I don't want to do asynchronous PCI device probing, for example (ie we probe the hardware synchronously, the PCI driver sets it all up synchronously, and the asynchronous portion is the non-PCI part if any - things like PHY detection, disk spinup etc). So async things are fine, but they have _huge_ disadvantages, and I'll personally take reliability and a stable serial algorithm over an async one as far as possible. That's partly why I realy did suggest that we do the async stuff purely in the USB layer, rather than try to put it deeper in the device layer. And if we do support it "natively" in the device layer like Rafael's latest patch, I still think we should be very very nervous about making devices async unless there is a measured - and very noticeable - advantage. So I really don't want to push things any further than absolutely necessary. I do not think that something like "embedded audio" is a reason for async, for example. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-09 15:38 ` Linus Torvalds @ 2009-12-09 15:57 ` Alan Stern 2009-12-25 17:09 ` Pavel Machek 0 siblings, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-09 15:57 UTC (permalink / raw) To: Linus Torvalds Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wed, 9 Dec 2009, Linus Torvalds wrote: > That's partly why I realy did suggest that we do the async stuff purely in > the USB layer, rather than try to put it deeper in the device layer. And > if we do support it "natively" in the device layer like Rafael's latest > patch, I still think we should be very very nervous about making devices > async unless there is a measured - and very noticeable - advantage. Agreed. Arjan's measurements indicated that USB was one of the biggest offenders; everything else other than the PS/2 mouse was much faster. Given these results there isn't much incentive to do anything else asynchronously. (However other devices not present on Arjan's machine may be a different story. Spinning up multiple external disks is a good example -- although here it may be necessary for the driver to take charge, because spinning up a disk requires a lot of power and doing too many of them at the same time could be bad.) Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-09 15:57 ` Alan Stern @ 2009-12-25 17:09 ` Pavel Machek 0 siblings, 0 replies; 235+ messages in thread From: Pavel Machek @ 2009-12-25 17:09 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list Hi! > > That's partly why I realy did suggest that we do the async stuff purely in > > the USB layer, rather than try to put it deeper in the device layer. And > > if we do support it "natively" in the device layer like Rafael's latest > > patch, I still think we should be very very nervous about making devices > > async unless there is a measured - and very noticeable - advantage. > > Agreed. Arjan's measurements indicated that USB was one of the biggest > offenders; everything else other than the PS/2 mouse was much faster. > Given these results there isn't much incentive to do anything else > asynchronously. > > (However other devices not present on Arjan's machine may be a > different story. Spinning up multiple external disks is a good example > -- although here it may be necessary for the driver to take charge, > because spinning up a disk requires a lot of power and doing too many > of them at the same time could be bad.) Well, system would better be able to supply enough current... because usb disks auto-sleep on their own, and then something like async ls -l /*/* would kill your machine... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-09 2:35 ` Alan Stern 2009-12-09 2:54 ` Linus Torvalds @ 2009-12-09 13:38 ` Mark Brown 2009-12-09 15:49 ` Alan Stern 1 sibling, 1 reply; 235+ messages in thread From: Mark Brown @ 2009-12-09 13:38 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, Dec 08, 2009 at 09:35:59PM -0500, Alan Stern wrote: > On Tue, 8 Dec 2009, Linus Torvalds wrote: > > It's just that I think the "looping over children" is ugly, when I think > > that by doing it the other way around you can make the code simpler and > > only depend on the PM device list and a simple parent pointer access. > I agree that it is uglier. The only advantage is in handling > asynchronous non-tree suspend dependencies, of which we probably won't > have very many. In fact, I don't know of _any_ offhand. There's some potential for this in embedded audio - it wants to bring down the entire embedded audio subsystem at once before the individual devices (and their parents) get suspended since bringing them down out of sync can result in audible artifacts. Depending on the system the suspend may take a noticable amount of time so it'd be nice to be able to run it asynchronously, though we don't currently do so. At the minute we get away with this mostly through not being able to represent the cases that are likely to actually trip up over it. > Interestingly, this non-tree dependency problem does not affect resume. Embedded audio does potentially - the resume needs all the individual devices in the subsystem and can take a substantial proportion of the overall resume time. Currently we get away with a combination of assuming that all the drivers are live when we decide to start resuming them and using the ALSA userspace API to deal with bringing the resume out of line, but it's not ideal. ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-09 13:38 ` Mark Brown @ 2009-12-09 15:49 ` Alan Stern 2009-12-09 16:02 ` Mark Brown 0 siblings, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-09 15:49 UTC (permalink / raw) To: Mark Brown Cc: Linus Torvalds, Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wed, 9 Dec 2009, Mark Brown wrote: > On Tue, Dec 08, 2009 at 09:35:59PM -0500, Alan Stern wrote: > > On Tue, 8 Dec 2009, Linus Torvalds wrote: > > > > It's just that I think the "looping over children" is ugly, when I think > > > that by doing it the other way around you can make the code simpler and > > > only depend on the PM device list and a simple parent pointer access. > > > I agree that it is uglier. The only advantage is in handling > > asynchronous non-tree suspend dependencies, of which we probably won't > > have very many. In fact, I don't know of _any_ offhand. > > There's some potential for this in embedded audio - it wants to bring > down the entire embedded audio subsystem at once before the individual > devices (and their parents) get suspended since bringing them down out > of sync can result in audible artifacts. Depending on the system the > suspend may take a noticable amount of time so it'd be nice to be able > to run it asynchronously, though we don't currently do so. For something like bringing down the entire embedded audio subsystem, which isn't directly tied to a single device, you would probably be better off doing it when the PM core broadcasts a suspend notification (see register_pm_notifier() in include/linux/suspend.h). This occurs before any devices are suspended, so synchronization isn't an issue. > At the minute we get away with this mostly through not being able to > represent the cases that are likely to actually trip up over it. > > > Interestingly, this non-tree dependency problem does not affect resume. > > Embedded audio does potentially - the resume needs all the individual > devices in the subsystem and can take a substantial proportion of the > overall resume time. Currently we get away with a combination of > assuming that all the drivers are live when we decide to start resuming > them and using the ALSA userspace API to deal with bringing the resume > out of line, but it's not ideal. You can do the same thing with the resume notifier. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-09 15:49 ` Alan Stern @ 2009-12-09 16:02 ` Mark Brown 2009-12-09 16:23 ` Alan Stern 0 siblings, 1 reply; 235+ messages in thread From: Mark Brown @ 2009-12-09 16:02 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wed, Dec 09, 2009 at 10:49:56AM -0500, Alan Stern wrote: > On Wed, 9 Dec 2009, Mark Brown wrote: > > There's some potential for this in embedded audio - it wants to bring > > down the entire embedded audio subsystem at once before the individual > > devices (and their parents) get suspended since bringing them down out > For something like bringing down the entire embedded audio subsystem, > which isn't directly tied to a single device, you would probably be > better off doing it when the PM core broadcasts a suspend notification > (see register_pm_notifier() in include/linux/suspend.h). This occurs > before any devices are suspended, so synchronization isn't an issue. I'm not convinced that helps with the fact that the suspend may take a long time - ideally we'd be able to start the suspend process off but let other things carry on while it completes without having to worry about something we're relying on getting suspended underneath us. > > Embedded audio does potentially - the resume needs all the individual > > overall resume time. Currently we get away with a combination of > You can do the same thing with the resume notifier. Similarly, the length of time the resume may take to complete means it'd be nice to start as soon as we've got the devices and complete it at our leisure. This is less pressing since we can tell the PM core we've resumed but still block userspace. ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-09 16:02 ` Mark Brown @ 2009-12-09 16:23 ` Alan Stern 2009-12-09 16:46 ` Mark Brown 0 siblings, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-09 16:23 UTC (permalink / raw) To: Mark Brown Cc: Linus Torvalds, Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wed, 9 Dec 2009, Mark Brown wrote: > On Wed, Dec 09, 2009 at 10:49:56AM -0500, Alan Stern wrote: > > On Wed, 9 Dec 2009, Mark Brown wrote: > > > > There's some potential for this in embedded audio - it wants to bring > > > down the entire embedded audio subsystem at once before the individual > > > devices (and their parents) get suspended since bringing them down out > > > For something like bringing down the entire embedded audio subsystem, > > which isn't directly tied to a single device, you would probably be > > better off doing it when the PM core broadcasts a suspend notification > > (see register_pm_notifier() in include/linux/suspend.h). This occurs > > before any devices are suspended, so synchronization isn't an issue. > > I'm not convinced that helps with the fact that the suspend may take a > long time - ideally we'd be able to start the suspend process off but > let other things carry on while it completes without having to worry > about something we're relying on getting suspended underneath us. The suspend procedure is oriented around device structures, and what you're talking about isn't. It's something separate which has to be finished before _any_ of the audio devices are suspended. How long does it take to bring down the entire embedded audio subsystem? And how critical is the timing for typical systems? Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-09 16:23 ` Alan Stern @ 2009-12-09 16:46 ` Mark Brown 2009-12-09 16:57 ` Linus Torvalds 2009-12-09 17:10 ` Alan Stern 0 siblings, 2 replies; 235+ messages in thread From: Mark Brown @ 2009-12-09 16:46 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wed, Dec 09, 2009 at 11:23:00AM -0500, Alan Stern wrote: > On Wed, 9 Dec 2009, Mark Brown wrote: > > I'm not convinced that helps with the fact that the suspend may take a > > long time - ideally we'd be able to start the suspend process off but > > let other things carry on while it completes without having to worry > > about something we're relying on getting suspended underneath us. > The suspend procedure is oriented around device structures, and what > you're talking about isn't. It's something separate which has to be > finished before _any_ of the audio devices are suspended. In this context the "subsystem" actually has a struct device associated with it so does appear in the device flow. > How long does it take to bring down the entire embedded audio > subsystem? And how critical is the timing for typical systems? Worst case is about a second for both resume and suspend which means two seconds total but it's very hardware dependant. The latency budget for suspend and resume are both zero in an ideal world, users want to be able to suspend as much as possible which means they'd like it to take no perceptible time at the human level. Some hardware is at the point where that's getting realistic but the folks on older hardware still want to get as close to that as they can. ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-09 16:46 ` Mark Brown @ 2009-12-09 16:57 ` Linus Torvalds 2009-12-09 17:45 ` Mark Brown 2009-12-09 17:10 ` Alan Stern 1 sibling, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-09 16:57 UTC (permalink / raw) To: Mark Brown Cc: Alan Stern, Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wed, 9 Dec 2009, Mark Brown wrote: > > How long does it take to bring down the entire embedded audio > > subsystem? And how critical is the timing for typical systems? > > Worst case is about a second for both resume and suspend which means two > seconds total but it's very hardware dependant. I would seriously suggest just looking at the code itself. Maybe the code is just plain sh*t? If we're talking embedded audio, we're generally talking SoC chips (maybe some external audio daughtercard), and quite frankly, it sounds to me like you're just wasting your own time. There is no way that kind of hardware really needs that much time. We should not design the device infrastructure for crap coding. Now, I can easily see one-second delays in code that simply has never been thought about or cared about it. We used to have things like that in the serial code where just probing for non-existent serial ports took half a second per port because there was a timeout. But christ, using that as an argument for "we should do things asynchronously" sounds like a crazy idea. Why not just take a hard look at the driver in question, asking hard questions like "does it really need to do something horrible like that"? Because bad coding is much more likely to be the real reason. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-09 16:57 ` Linus Torvalds @ 2009-12-09 17:45 ` Mark Brown 2009-12-09 17:57 ` Linus Torvalds 0 siblings, 1 reply; 235+ messages in thread From: Mark Brown @ 2009-12-09 17:45 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wed, Dec 09, 2009 at 08:57:32AM -0800, Linus Torvalds wrote: > On Wed, 9 Dec 2009, Mark Brown wrote: > > Worst case is about a second for both resume and suspend which means two > > seconds total but it's very hardware dependant. > I would seriously suggest just looking at the code itself. > Maybe the code is just plain sh*t? If we're talking embedded audio, we're > generally talking SoC chips (maybe some external audio daughtercard), and Yes, usually this is a SoC plus one or more external devices handling the mixed signal parts of things all soldered down onto a board. > quite frankly, it sounds to me like you're just wasting your own time. > There is no way that kind of hardware really needs that much time. Some of the older hardware really does need that much time, sadly. More recent hardware got that down much lower (into the low hundreds of ms where it's much less of an issue but still present) and current generations basically don't have the problem any more but for worst case a second is a good approximation. The problem comes when you've got audio outputs referenced to something other than ground which used to happen because no negative supplies were available in these systems. To bring these up from cold you need to bring the outputs up to the reference level but if you do that by just turning on the power you get an audible (often loud) noise in the output from the square(ish) waveform that results which users don't find acceptable. The initial solution was to ramp the voltage on the outputs in such a way that the waveform that appears on the outputs isn't audible, which broadly boils down to ramping it slowly. People were very aware of the problems so later generations of devices added features which allowed this to happen much more quickly than the original implementations had, but still noticably slow in terms of the timescales people need. Current generation hardware solves the problem by using charge pumps to provide a negative supply, allowing ground referenced outputs which are just a win all round for this and other reasons. They're fast enough to allow the power up to be brought completely in line with the start of the audio stream, taking this out of suspend and resume entirely. > Now, I can easily see one-second delays in code that simply has never been > thought about or cared about it. We used to have things like that in the > serial code where just probing for non-existent serial ports took half a > second per port because there was a timeout. It's a deliberate delay waiting for the voltages to ramp, there's plenty of things that need to be fixed or optimised in the code but those that are causing issues these days really are just explicitly inserted delays waiting for things to happen in hardware that do actually take that long. > Because bad coding is much more likely to be the real reason. Would that it were - you wouldn't believe the amount of time that's been spent over the years tuning for this. ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-09 17:45 ` Mark Brown @ 2009-12-09 17:57 ` Linus Torvalds 2009-12-09 18:27 ` Mark Brown 0 siblings, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-09 17:57 UTC (permalink / raw) To: Mark Brown Cc: Alan Stern, Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wed, 9 Dec 2009, Mark Brown wrote: > > The problem comes when you've got audio outputs referenced to something > other than ground which used to happen because no negative supplies were > available in these systems. To bring these up from cold you need to > bring the outputs up to the reference level but if you do that by just > turning on the power you get an audible (often loud) noise in the output > from the square(ish) waveform that results which users don't find > acceptable. Ouch. A second still sounds way too long - but whatever. However, it sounds like the nice way to do that isn't by doing it synchronously in the suspend/resume code itself, but simply ramping it down (and up) from a timer. It would be asynchronous, but not because the suspend itself is in any way asynchronous. Done right, it might even result in a nice volume fade of the sound (ie if the hw allows for it, stop the actual sound engine late on suspend, and start it early on resume, so that sound works _while_ the whole reference volume rampdown/up is going on) Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-09 17:57 ` Linus Torvalds @ 2009-12-09 18:27 ` Mark Brown 0 siblings, 0 replies; 235+ messages in thread From: Mark Brown @ 2009-12-09 18:27 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wed, Dec 09, 2009 at 09:57:22AM -0800, Linus Torvalds wrote: > On Wed, 9 Dec 2009, Mark Brown wrote: > > The problem comes when you've got audio outputs referenced to something > > other than ground which used to happen because no negative supplies were > > available in these systems. To bring these up from cold you need to > > bring the outputs up to the reference level but if you do that by just > > turning on the power you get an audible (often loud) noise in the output > > from the square(ish) waveform that results which users don't find > > acceptable. > Ouch. A second still sounds way too long - but whatever. Yes, I think there's pretty much universal agreement on that :) Hardware that needs a few hundred miliseconds is much more common at the minute (and like I say current generation hardware is basically unaffected), but it's the number I keep in mind when considering how bad things might be. > However, it sounds like the nice way to do that isn't by doing it > synchronously in the suspend/resume code itself, but simply ramping it > down (and up) from a timer. It would be asynchronous, but not because the > suspend itself is in any way asynchronous. We don't actually need a timer for most of this - generally the ramp is done by charging or discharging a capacitor through a resistor so you just set it going then wait, possibly in several stages with a little bit twiddling in the middle to speed things up which could be done off a timer. > Done right, it might even result in a nice volume fade of the sound (ie if > the hw allows for it, stop the actual sound engine late on suspend, and > start it early on resume, so that sound works _while_ the whole reference > volume rampdown/up is going on) The big issue with running off a partially ramped supply is that it can upset the analogue components - for example, if an amplifier is trying to handle a signal with an amplitude outside the supply range then it'll clip. But sometimes that approach does work and it does get used. For resume we're pretty much taking care of it already by moving the resume out of the main device resume and using ALSA-specific stuff to keep audio streams stopped until we're done but for suspend we don't know the system is going down until the suspend starts and we do want to make sure we got the analogue into a known poweroff state so that we can control powerup properly. ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-09 16:46 ` Mark Brown 2009-12-09 16:57 ` Linus Torvalds @ 2009-12-09 17:10 ` Alan Stern 2009-12-09 17:19 ` Linus Torvalds 2009-12-09 18:08 ` Mark Brown 1 sibling, 2 replies; 235+ messages in thread From: Alan Stern @ 2009-12-09 17:10 UTC (permalink / raw) To: Mark Brown Cc: Linus Torvalds, Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wed, 9 Dec 2009, Mark Brown wrote: > > How long does it take to bring down the entire embedded audio > > subsystem? And how critical is the timing for typical systems? > > Worst case is about a second for both resume and suspend which means two > seconds total but it's very hardware dependant. A second seems awfully long. What happens if audio isn't being played when the suspend occurs? Can't you shorten things with no artifacts in that case? If audio _is_ being played when a suspend occurs, users probably don't mind audible artifacts. In fact, they probably expect some. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-09 17:10 ` Alan Stern @ 2009-12-09 17:19 ` Linus Torvalds 2009-12-09 18:08 ` Mark Brown 1 sibling, 0 replies; 235+ messages in thread From: Linus Torvalds @ 2009-12-09 17:19 UTC (permalink / raw) To: Alan Stern Cc: Mark Brown, Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wed, 9 Dec 2009, Alan Stern wrote: > > If audio _is_ being played when a suspend occurs, users probably don't > mind audible artifacts. In fact, they probably expect some. I'd say it's physically impossible not to get them. If you're really suspending your audio hardware, it _will_ be quiet ;) I suspect somebody is draining existing queues or something, or just probing for an external analog part. Neither of which is really sensible or absolutely required in an embedded suspend/resume kind of situation. Especially for STR, just "leave all the data structures around, and just stop the DMA engine" is often a perfectly fine solution - but drivers don't do it, exactly because we've often had the mentality that you re-initialize everything under the sun. I can see _why_ a driver would do that ("we re-use the same code that we use on close/open or module unload/reload"), but it doesn't change the fact that it's stupid to do if you worry about latency. And yeah, turning it async might hide the problem. But the code word there is "hide" rather than "fix". Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-09 17:10 ` Alan Stern 2009-12-09 17:19 ` Linus Torvalds @ 2009-12-09 18:08 ` Mark Brown 1 sibling, 0 replies; 235+ messages in thread From: Mark Brown @ 2009-12-09 18:08 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wed, Dec 09, 2009 at 12:10:03PM -0500, Alan Stern wrote: > On Wed, 9 Dec 2009, Mark Brown wrote: > > Worst case is about a second for both resume and suspend which means two > > seconds total but it's very hardware dependant. > A second seems awfully long. What happens if audio isn't being played > when the suspend occurs? Can't you shorten things with no artifacts in > that case? For the affected hardware the problem is basically the same with or without audio being played. As I said in my reply to Linus this is delays caused by ramping reference voltages. These delays are sufficiently long that the reference voltages have to be maintained all the time so that they don't delay the start of audio streams which means that having or not having an audio stream at suspend time doesn't affect the reference voltage ramps since we don't turn them off when not in use. There is a win from other stuff having been shut off already, but it's already being exploited. On suspend the problem is the same as for resume - we need to ramp the voltages quietly, this time down to zero. We want to make sure they're actually at zero to ensure that the ramp at resume time starts from a known hardware state. ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 19:44 ` Rafael J. Wysocki 2009-12-08 20:16 ` Alan Stern @ 2009-12-08 21:04 ` Linus Torvalds 2009-12-08 21:40 ` Rafael J. Wysocki 1 sibling, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-08 21:04 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > > Anyway, if we use an rwsem, it won't be checkable from interrupt context just > as well. You can't do a lock() from an interrupt, but the unlocks should be irq-safe. > Suppose we use rwsem and during suspend each child uses a down_read() on a > parent and then the parent uses down_write() on itself. What if, whatever the > reason, the parent is a bit early and does the down_write() before one of the > children has a chance to do the down_read()? Aren't we toast? We're toast, but we're toast for a totally unrealted reason: it means that you tried to resume a child before a parent, which would be a major bug to begin with. Look, I even wrote out the comments, so let me repeat the code one more time. - suspend time calling: // This won't block, because we suspend nodes before parents down_read(node->parent->lock); // Do the part that may block asynchronously async_schedule(do_usb_node_suspend, node); - resume time calling: // This won't block, because we resume parents before children, // and the children will take the read lock. down_write(leaf->lock); // Do the blocking part asynchronously async_schedule(usb_node_resume, leaf); See? So when we take the parent lock for suspend, we are guaranteed to do so _before_ the parent node itself suspends. And conversely, when we take the parent lock (asynchronously) for resume, we're guaranteed to do that _after_ the parent node has done its own down_write. And that all depends on just one trivial thing; that the suspend and resume is called in the right order (children first vs parent first respectively). And that is such a _major_ correctness issue that if that isn't correct, your suspend isn't going to work _anyway_. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 21:04 ` Linus Torvalds @ 2009-12-08 21:40 ` Rafael J. Wysocki 2009-12-08 22:03 ` Rafael J. Wysocki 0 siblings, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-08 21:40 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tuesday 08 December 2009, Linus Torvalds wrote: > > On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > > > > Anyway, if we use an rwsem, it won't be checkable from interrupt context just > > as well. > > You can't do a lock() from an interrupt, but the unlocks should be > irq-safe. > > > Suppose we use rwsem and during suspend each child uses a down_read() on a > > parent and then the parent uses down_write() on itself. What if, whatever the > > reason, the parent is a bit early and does the down_write() before one of the > > children has a chance to do the down_read()? Aren't we toast? > > We're toast, but we're toast for a totally unrealted reason: it means that > you tried to resume a child before a parent, which would be a major bug to > begin with. > > Look, I even wrote out the comments, so let me repeat the code one more > time. > > - suspend time calling: > // This won't block, because we suspend nodes before parents > down_read(node->parent->lock); > // Do the part that may block asynchronously > async_schedule(do_usb_node_suspend, node); > > - resume time calling: > // This won't block, because we resume parents before children, > // and the children will take the read lock. > down_write(leaf->lock); > // Do the blocking part asynchronously > async_schedule(usb_node_resume, leaf); > > See? So when we take the parent lock for suspend, we are guaranteed to do > so _before_ the parent node itself suspends. And conversely, when we take > the parent lock (asynchronously) for resume, we're guaranteed to do that > _after_ the parent node has done its own down_write. > > And that all depends on just one trivial thing; that the suspend and > resume is called in the right order (children first vs parent first > respectively). And that is such a _major_ correctness issue that if that > isn't correct, your suspend isn't going to work _anyway_. Understood (I think). Let's try it, then. Below is the resume patch based on my previous one in this thread (I have only verified that it builds). Is that along the lines you want? Rafael --- drivers/base/power/main.c | 78 ++++++++++++++++++++++++++++++++++++++----- include/linux/device.h | 6 +++ include/linux/pm.h | 3 + include/linux/resume-trace.h | 7 +++ 4 files changed, 85 insertions(+), 9 deletions(-) Index: linux-2.6/include/linux/pm.h =================================================================== --- linux-2.6.orig/include/linux/pm.h +++ linux-2.6/include/linux/pm.h @@ -26,6 +26,7 @@ #include <linux/spinlock.h> #include <linux/wait.h> #include <linux/timer.h> +#include <linux/rwsem.h> /* * Callbacks for platform drivers to implement. @@ -412,9 +413,11 @@ struct dev_pm_info { pm_message_t power_state; unsigned int can_wakeup:1; unsigned int should_wakeup:1; + unsigned async_suspend:1; enum dpm_state status; /* Owned by the PM core */ #ifdef CONFIG_PM_SLEEP struct list_head entry; + struct rw_semaphore rwsem; #endif #ifdef CONFIG_PM_RUNTIME struct timer_list suspend_timer; Index: linux-2.6/include/linux/device.h =================================================================== --- linux-2.6.orig/include/linux/device.h +++ linux-2.6/include/linux/device.h @@ -472,6 +472,12 @@ static inline int device_is_registered(s return dev->kobj.state_in_sysfs; } +static inline void device_enable_async_suspend(struct device *dev, bool enable) +{ + if (dev->power.status == DPM_ON) + dev->power.async_suspend = enable; +} + void driver_init(void); /* Index: linux-2.6/drivers/base/power/main.c =================================================================== --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -25,6 +25,7 @@ #include <linux/resume-trace.h> #include <linux/rwsem.h> #include <linux/interrupt.h> +#include <linux/async.h> #include "../base.h" #include "power.h" @@ -42,6 +43,7 @@ LIST_HEAD(dpm_list); static DEFINE_MUTEX(dpm_list_mtx); +static pm_message_t pm_transition; /* * Set once the preparation of devices for a PM transition has started, reset @@ -56,6 +58,7 @@ static bool transition_started; void device_pm_init(struct device *dev) { dev->power.status = DPM_ON; + init_rwsem(&dev->power.rwsem); pm_runtime_init(dev); } @@ -334,25 +337,51 @@ static void pm_dev_err(struct device *de * The driver of @dev will not receive interrupts while this function is being * executed. */ -static int device_resume_noirq(struct device *dev, pm_message_t state) +static int __device_resume_noirq(struct device *dev, pm_message_t state) { int error = 0; TRACE_DEVICE(dev); TRACE_RESUME(0); - if (!dev->bus) - goto End; + down_read(&dev->parent->power.rwsem); - if (dev->bus->pm) { + if (dev->bus && dev->bus->pm) { pm_dev_dbg(dev, state, "EARLY "); error = pm_noirq_op(dev, dev->bus->pm, state); } - End: + + up_read(&dev->parent->power.rwsem); + up_write(&dev->power.rwsem); + TRACE_RESUME(error); return error; } +static void async_resume_noirq(void *data, async_cookie_t cookie) +{ + struct device *dev = (struct device *)data; + int error; + + error = __device_resume_noirq(dev, pm_transition); + if (error) + pm_dev_err(dev, pm_transition, " async EARLY", error); + put_device(dev); +} + +static int device_resume_noirq(struct device *dev) +{ + down_write(&dev->power.rwsem); + + if (dev->power.async_suspend && !pm_trace_is_enabled()) { + get_device(dev); + async_schedule(async_resume_noirq, dev); + return 0; + } + + return __device_resume_noirq(dev, pm_transition); +} + /** * dpm_resume_noirq - Execute "early resume" callbacks for non-sysdev devices. * @state: PM transition of the system being carried out. @@ -366,32 +395,35 @@ void dpm_resume_noirq(pm_message_t state mutex_lock(&dpm_list_mtx); transition_started = false; + pm_transition = state; list_for_each_entry(dev, &dpm_list, power.entry) if (dev->power.status > DPM_OFF) { int error; dev->power.status = DPM_OFF; - error = device_resume_noirq(dev, state); + error = device_resume_noirq(dev); if (error) pm_dev_err(dev, state, " early", error); } mutex_unlock(&dpm_list_mtx); + async_synchronize_full(); resume_device_irqs(); } EXPORT_SYMBOL_GPL(dpm_resume_noirq); /** - * device_resume - Execute "resume" callbacks for given device. + * __device_resume - Execute "resume" callbacks for given device. * @dev: Device to handle. * @state: PM transition of the system being carried out. */ -static int device_resume(struct device *dev, pm_message_t state) +static int __device_resume(struct device *dev, pm_message_t state) { int error = 0; TRACE_DEVICE(dev); TRACE_RESUME(0); + down_read(&dev->parent->power.rwsem); down(&dev->sem); if (dev->bus) { @@ -426,11 +458,37 @@ static int device_resume(struct device * } End: up(&dev->sem); + up_read(&dev->parent->power.rwsem); + up_write(&dev->power.rwsem); TRACE_RESUME(error); return error; } +static void async_resume(void *data, async_cookie_t cookie) +{ + struct device *dev = (struct device *)data; + int error; + + error = __device_resume(dev, pm_transition); + if (error) + pm_dev_err(dev, pm_transition, " async", error); + put_device(dev); +} + +static int device_resume(struct device *dev) +{ + down_write(&dev->power.rwsem); + + if (dev->power.async_suspend && !pm_trace_is_enabled()) { + get_device(dev); + async_schedule(async_resume, dev); + return 0; + } + + return __device_resume(dev, pm_transition); +} + /** * dpm_resume - Execute "resume" callbacks for non-sysdev devices. * @state: PM transition of the system being carried out. @@ -444,6 +502,7 @@ static void dpm_resume(pm_message_t stat INIT_LIST_HEAD(&list); mutex_lock(&dpm_list_mtx); + pm_transition = state; while (!list_empty(&dpm_list)) { struct device *dev = to_device(dpm_list.next); @@ -454,7 +513,7 @@ static void dpm_resume(pm_message_t stat dev->power.status = DPM_RESUMING; mutex_unlock(&dpm_list_mtx); - error = device_resume(dev, state); + error = device_resume(dev); mutex_lock(&dpm_list_mtx); if (error) @@ -469,6 +528,7 @@ static void dpm_resume(pm_message_t stat } list_splice(&list, &dpm_list); mutex_unlock(&dpm_list_mtx); + async_synchronize_full(); } /** Index: linux-2.6/include/linux/resume-trace.h =================================================================== --- linux-2.6.orig/include/linux/resume-trace.h +++ linux-2.6/include/linux/resume-trace.h @@ -6,6 +6,11 @@ extern int pm_trace_enabled; +static inline int pm_trace_is_enabled(void) +{ + return pm_trace_enabled; +} + struct device; extern void set_trace_device(struct device *); extern void generate_resume_trace(const void *tracedata, unsigned int user); @@ -17,6 +22,8 @@ extern void generate_resume_trace(const #else +static inline int pm_trace_is_enabled(void) { return 0; } + #define TRACE_DEVICE(dev) do { } while (0) #define TRACE_RESUME(dev) do { } while (0) ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 21:40 ` Rafael J. Wysocki @ 2009-12-08 22:03 ` Rafael J. Wysocki 2009-12-08 22:55 ` Async suspend-resume patch w/ rwsems " Rafael J. Wysocki 0 siblings, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-08 22:03 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tuesday 08 December 2009, Rafael J. Wysocki wrote: > On Tuesday 08 December 2009, Linus Torvalds wrote: > > > > On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > > > > > > Anyway, if we use an rwsem, it won't be checkable from interrupt context just > > > as well. > > > > You can't do a lock() from an interrupt, but the unlocks should be > > irq-safe. > > > > > Suppose we use rwsem and during suspend each child uses a down_read() on a > > > parent and then the parent uses down_write() on itself. What if, whatever the > > > reason, the parent is a bit early and does the down_write() before one of the > > > children has a chance to do the down_read()? Aren't we toast? > > > > We're toast, but we're toast for a totally unrealted reason: it means that > > you tried to resume a child before a parent, which would be a major bug to > > begin with. > > > > Look, I even wrote out the comments, so let me repeat the code one more > > time. > > > > - suspend time calling: > > // This won't block, because we suspend nodes before parents > > down_read(node->parent->lock); > > // Do the part that may block asynchronously > > async_schedule(do_usb_node_suspend, node); > > > > - resume time calling: > > // This won't block, because we resume parents before children, > > // and the children will take the read lock. > > down_write(leaf->lock); > > // Do the blocking part asynchronously > > async_schedule(usb_node_resume, leaf); > > > > See? So when we take the parent lock for suspend, we are guaranteed to do > > so _before_ the parent node itself suspends. And conversely, when we take > > the parent lock (asynchronously) for resume, we're guaranteed to do that > > _after_ the parent node has done its own down_write. > > > > And that all depends on just one trivial thing; that the suspend and > > resume is called in the right order (children first vs parent first > > respectively). And that is such a _major_ correctness issue that if that > > isn't correct, your suspend isn't going to work _anyway_. > > Understood (I think). > > Let's try it, then. Below is the resume patch based on my previous one in this > thread (I have only verified that it builds). Ah, I need to check if dev->parent is not NULL before trying to lock it, but apart from this it doesn't break things at least. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Async suspend-resume patch w/ rwsems (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 22:03 ` Rafael J. Wysocki @ 2009-12-08 22:55 ` Rafael J. Wysocki 2009-12-08 23:24 ` Rafael J. Wysocki 2009-12-09 20:15 ` Alan Stern 0 siblings, 2 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-08 22:55 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tuesday 08 December 2009, Rafael J. Wysocki wrote: > On Tuesday 08 December 2009, Rafael J. Wysocki wrote: > > On Tuesday 08 December 2009, Linus Torvalds wrote: > > > > > > On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > > > > > > > > Anyway, if we use an rwsem, it won't be checkable from interrupt context just > > > > as well. > > > > > > You can't do a lock() from an interrupt, but the unlocks should be > > > irq-safe. > > > > > > > Suppose we use rwsem and during suspend each child uses a down_read() on a > > > > parent and then the parent uses down_write() on itself. What if, whatever the > > > > reason, the parent is a bit early and does the down_write() before one of the > > > > children has a chance to do the down_read()? Aren't we toast? > > > > > > We're toast, but we're toast for a totally unrealted reason: it means that > > > you tried to resume a child before a parent, which would be a major bug to > > > begin with. > > > > > > Look, I even wrote out the comments, so let me repeat the code one more > > > time. > > > > > > - suspend time calling: > > > // This won't block, because we suspend nodes before parents > > > down_read(node->parent->lock); > > > // Do the part that may block asynchronously > > > async_schedule(do_usb_node_suspend, node); > > > > > > - resume time calling: > > > // This won't block, because we resume parents before children, > > > // and the children will take the read lock. > > > down_write(leaf->lock); > > > // Do the blocking part asynchronously > > > async_schedule(usb_node_resume, leaf); > > > > > > See? So when we take the parent lock for suspend, we are guaranteed to do > > > so _before_ the parent node itself suspends. And conversely, when we take > > > the parent lock (asynchronously) for resume, we're guaranteed to do that > > > _after_ the parent node has done its own down_write. > > > > > > And that all depends on just one trivial thing; that the suspend and > > > resume is called in the right order (children first vs parent first > > > respectively). And that is such a _major_ correctness issue that if that > > > isn't correct, your suspend isn't going to work _anyway_. > > > > Understood (I think). > > > > Let's try it, then. Below is the resume patch based on my previous one in this > > thread (I have only verified that it builds). > > Ah, I need to check if dev->parent is not NULL before trying to lock it, but > apart from this it doesn't break things at least. For completness, below is the full async suspend/resume patch with rwlocks, that has been (very slightly) tested and doesn't seem to break things. [Note to Alan: lockdep doesn't seem to complain about the not annotated nested locks.] Thanks, Rafael --- drivers/base/power/main.c | 195 +++++++++++++++++++++++++++++++++++++++---- include/linux/device.h | 6 + include/linux/pm.h | 3 include/linux/resume-trace.h | 7 + 4 files changed, 194 insertions(+), 17 deletions(-) Index: linux-2.6/include/linux/pm.h =================================================================== --- linux-2.6.orig/include/linux/pm.h +++ linux-2.6/include/linux/pm.h @@ -26,6 +26,7 @@ #include <linux/spinlock.h> #include <linux/wait.h> #include <linux/timer.h> +#include <linux/rwsem.h> /* * Callbacks for platform drivers to implement. @@ -412,9 +413,11 @@ struct dev_pm_info { pm_message_t power_state; unsigned int can_wakeup:1; unsigned int should_wakeup:1; + unsigned async_suspend:1; enum dpm_state status; /* Owned by the PM core */ #ifdef CONFIG_PM_SLEEP struct list_head entry; + struct rw_semaphore rwsem; #endif #ifdef CONFIG_PM_RUNTIME struct timer_list suspend_timer; Index: linux-2.6/include/linux/device.h =================================================================== --- linux-2.6.orig/include/linux/device.h +++ linux-2.6/include/linux/device.h @@ -472,6 +472,12 @@ static inline int device_is_registered(s return dev->kobj.state_in_sysfs; } +static inline void device_enable_async_suspend(struct device *dev, bool enable) +{ + if (dev->power.status == DPM_ON) + dev->power.async_suspend = enable; +} + void driver_init(void); /* Index: linux-2.6/drivers/base/power/main.c =================================================================== --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -25,6 +25,7 @@ #include <linux/resume-trace.h> #include <linux/rwsem.h> #include <linux/interrupt.h> +#include <linux/async.h> #include "../base.h" #include "power.h" @@ -42,6 +43,7 @@ LIST_HEAD(dpm_list); static DEFINE_MUTEX(dpm_list_mtx); +static pm_message_t pm_transition; /* * Set once the preparation of devices for a PM transition has started, reset @@ -56,6 +58,7 @@ static bool transition_started; void device_pm_init(struct device *dev) { dev->power.status = DPM_ON; + init_rwsem(&dev->power.rwsem); pm_runtime_init(dev); } @@ -334,25 +337,53 @@ static void pm_dev_err(struct device *de * The driver of @dev will not receive interrupts while this function is being * executed. */ -static int device_resume_noirq(struct device *dev, pm_message_t state) +static int __device_resume_noirq(struct device *dev, pm_message_t state) { int error = 0; TRACE_DEVICE(dev); TRACE_RESUME(0); - if (!dev->bus) - goto End; + if (dev->parent) + down_read(&dev->parent->power.rwsem); - if (dev->bus->pm) { + if (dev->bus && dev->bus->pm) { pm_dev_dbg(dev, state, "EARLY "); error = pm_noirq_op(dev, dev->bus->pm, state); } - End: + + if (dev->parent) + up_read(&dev->parent->power.rwsem); + up_write(&dev->power.rwsem); + TRACE_RESUME(error); return error; } +static void async_resume_noirq(void *data, async_cookie_t cookie) +{ + struct device *dev = (struct device *)data; + int error; + + error = __device_resume_noirq(dev, pm_transition); + if (error) + pm_dev_err(dev, pm_transition, " async EARLY", error); + put_device(dev); +} + +static int device_resume_noirq(struct device *dev) +{ + down_write(&dev->power.rwsem); + + if (dev->power.async_suspend && !pm_trace_is_enabled()) { + get_device(dev); + async_schedule(async_resume_noirq, dev); + return 0; + } + + return __device_resume_noirq(dev, pm_transition); +} + /** * dpm_resume_noirq - Execute "early resume" callbacks for non-sysdev devices. * @state: PM transition of the system being carried out. @@ -366,32 +397,36 @@ void dpm_resume_noirq(pm_message_t state mutex_lock(&dpm_list_mtx); transition_started = false; + pm_transition = state; list_for_each_entry(dev, &dpm_list, power.entry) if (dev->power.status > DPM_OFF) { int error; dev->power.status = DPM_OFF; - error = device_resume_noirq(dev, state); + error = device_resume_noirq(dev); if (error) pm_dev_err(dev, state, " early", error); } mutex_unlock(&dpm_list_mtx); + async_synchronize_full(); resume_device_irqs(); } EXPORT_SYMBOL_GPL(dpm_resume_noirq); /** - * device_resume - Execute "resume" callbacks for given device. + * __device_resume - Execute "resume" callbacks for given device. * @dev: Device to handle. * @state: PM transition of the system being carried out. */ -static int device_resume(struct device *dev, pm_message_t state) +static int __device_resume(struct device *dev, pm_message_t state) { int error = 0; TRACE_DEVICE(dev); TRACE_RESUME(0); + if (dev->parent) + down_read(&dev->parent->power.rwsem); down(&dev->sem); if (dev->bus) { @@ -426,11 +461,38 @@ static int device_resume(struct device * } End: up(&dev->sem); + if (dev->parent) + up_read(&dev->parent->power.rwsem); + up_write(&dev->power.rwsem); TRACE_RESUME(error); return error; } +static void async_resume(void *data, async_cookie_t cookie) +{ + struct device *dev = (struct device *)data; + int error; + + error = __device_resume(dev, pm_transition); + if (error) + pm_dev_err(dev, pm_transition, " async", error); + put_device(dev); +} + +static int device_resume(struct device *dev) +{ + down_write(&dev->power.rwsem); + + if (dev->power.async_suspend && !pm_trace_is_enabled()) { + get_device(dev); + async_schedule(async_resume, dev); + return 0; + } + + return __device_resume(dev, pm_transition); +} + /** * dpm_resume - Execute "resume" callbacks for non-sysdev devices. * @state: PM transition of the system being carried out. @@ -444,6 +506,7 @@ static void dpm_resume(pm_message_t stat INIT_LIST_HEAD(&list); mutex_lock(&dpm_list_mtx); + pm_transition = state; while (!list_empty(&dpm_list)) { struct device *dev = to_device(dpm_list.next); @@ -454,7 +517,7 @@ static void dpm_resume(pm_message_t stat dev->power.status = DPM_RESUMING; mutex_unlock(&dpm_list_mtx); - error = device_resume(dev, state); + error = device_resume(dev); mutex_lock(&dpm_list_mtx); if (error) @@ -469,6 +532,7 @@ static void dpm_resume(pm_message_t stat } list_splice(&list, &dpm_list); mutex_unlock(&dpm_list_mtx); + async_synchronize_full(); } /** @@ -533,6 +597,8 @@ static void dpm_complete(pm_message_t st mutex_unlock(&dpm_list_mtx); } +static atomic_t async_error; + /** * dpm_resume_end - Execute "resume" callbacks and complete system transition. * @state: PM transition of the system being carried out. @@ -580,20 +646,59 @@ static pm_message_t resume_event(pm_mess * The driver of @dev will not receive interrupts while this function is being * executed. */ -static int device_suspend_noirq(struct device *dev, pm_message_t state) +static int __device_suspend_noirq(struct device *dev, pm_message_t state) { int error = 0; - if (!dev->bus) - return 0; + down_write(&dev->power.rwsem); - if (dev->bus->pm) { + if (dev->bus && dev->bus->pm) { pm_dev_dbg(dev, state, "LATE "); error = pm_noirq_op(dev, dev->bus->pm, state); } + + up_write(&dev->power.rwsem); + if (dev->parent) + up_read(&dev->parent->power.rwsem); + return error; } +static void async_suspend_noirq(void *data, async_cookie_t cookie) +{ + struct device *dev = (struct device *)data; + int error = atomic_read(&async_error); + + if (error) { + if (dev->parent) + up_read(&dev->parent->power.rwsem); + dev->power.status = DPM_OFF; + return; + } + + error = __device_suspend_noirq(dev, pm_transition); + if (error) { + pm_dev_err(dev, pm_transition, " async LATE", error); + dev->power.status = DPM_OFF; + atomic_set(&async_error, error); + } + put_device(dev); +} + +static int device_suspend_noirq(struct device *dev) +{ + if (dev->parent) + down_read(&dev->parent->power.rwsem); + + if (dev->power.async_suspend) { + get_device(dev); + async_schedule(async_suspend_noirq, dev); + return 0; + } + + return __device_suspend_noirq(dev, pm_transition); +} + /** * dpm_suspend_noirq - Execute "late suspend" callbacks for non-sysdev devices. * @state: PM transition of the system being carried out. @@ -608,15 +713,21 @@ int dpm_suspend_noirq(pm_message_t state suspend_device_irqs(); mutex_lock(&dpm_list_mtx); + pm_transition = state; list_for_each_entry_reverse(dev, &dpm_list, power.entry) { - error = device_suspend_noirq(dev, state); + dev->power.status = DPM_OFF_IRQ; + error = device_suspend_noirq(dev); if (error) { pm_dev_err(dev, state, " late", error); + dev->power.status = DPM_OFF; break; } - dev->power.status = DPM_OFF_IRQ; + error = atomic_read(&async_error); + if (error) + break; } mutex_unlock(&dpm_list_mtx); + async_synchronize_full(); if (error) dpm_resume_noirq(resume_event(state)); return error; @@ -628,10 +739,11 @@ EXPORT_SYMBOL_GPL(dpm_suspend_noirq); * @dev: Device to handle. * @state: PM transition of the system being carried out. */ -static int device_suspend(struct device *dev, pm_message_t state) +static int __device_suspend(struct device *dev, pm_message_t state) { int error = 0; + down_write(&dev->power.rwsem); down(&dev->sem); if (dev->class) { @@ -668,10 +780,50 @@ static int device_suspend(struct device } End: up(&dev->sem); + up_write(&dev->power.rwsem); + if (dev->parent) + up_read(&dev->parent->power.rwsem); return error; } +static void async_suspend(void *data, async_cookie_t cookie) +{ + struct device *dev = (struct device *)data; + int error = atomic_read(&async_error); + + if (error) { + if (dev->parent) + up_read(&dev->parent->power.rwsem); + dev->power.status = DPM_SUSPENDING; + goto End; + } + + error = __device_suspend(dev, pm_transition); + if (error) { + pm_dev_err(dev, pm_transition, " async", error); + dev->power.status = DPM_SUSPENDING; + atomic_set(&async_error, error); + } + + End: + put_device(dev); +} + +static int device_suspend(struct device *dev, pm_message_t state) +{ + if (dev->parent) + down_read(&dev->parent->power.rwsem); + + if (dev->power.async_suspend) { + get_device(dev); + async_schedule(async_suspend, dev); + return 0; + } + + return __device_suspend(dev, pm_transition); +} + /** * dpm_suspend - Execute "suspend" callbacks for all non-sysdev devices. * @state: PM transition of the system being carried out. @@ -683,10 +835,12 @@ static int dpm_suspend(pm_message_t stat INIT_LIST_HEAD(&list); mutex_lock(&dpm_list_mtx); + pm_transition = state; while (!list_empty(&dpm_list)) { struct device *dev = to_device(dpm_list.prev); get_device(dev); + dev->power.status = DPM_OFF; mutex_unlock(&dpm_list_mtx); error = device_suspend(dev, state); @@ -694,16 +848,22 @@ static int dpm_suspend(pm_message_t stat mutex_lock(&dpm_list_mtx); if (error) { pm_dev_err(dev, state, "", error); + dev->power.status = DPM_SUSPENDING; put_device(dev); break; } - dev->power.status = DPM_OFF; if (!list_empty(&dev->power.entry)) list_move(&dev->power.entry, &list); put_device(dev); + error = atomic_read(&async_error); + if (error) + break; } list_splice(&list, dpm_list.prev); mutex_unlock(&dpm_list_mtx); + async_synchronize_full(); + if (!error) + error = atomic_read(&async_error); return error; } @@ -762,6 +922,7 @@ static int dpm_prepare(pm_message_t stat INIT_LIST_HEAD(&list); mutex_lock(&dpm_list_mtx); transition_started = true; + atomic_set(&async_error, 0); while (!list_empty(&dpm_list)) { struct device *dev = to_device(dpm_list.next); Index: linux-2.6/include/linux/resume-trace.h =================================================================== --- linux-2.6.orig/include/linux/resume-trace.h +++ linux-2.6/include/linux/resume-trace.h @@ -6,6 +6,11 @@ extern int pm_trace_enabled; +static inline int pm_trace_is_enabled(void) +{ + return pm_trace_enabled; +} + struct device; extern void set_trace_device(struct device *); extern void generate_resume_trace(const void *tracedata, unsigned int user); @@ -17,6 +22,8 @@ extern void generate_resume_trace(const #else +static inline int pm_trace_is_enabled(void) { return 0; } + #define TRACE_DEVICE(dev) do { } while (0) #define TRACE_RESUME(dev) do { } while (0) ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ rwsems (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 22:55 ` Async suspend-resume patch w/ rwsems " Rafael J. Wysocki @ 2009-12-08 23:24 ` Rafael J. Wysocki 2009-12-09 20:15 ` Alan Stern 1 sibling, 0 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-08 23:24 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tuesday 08 December 2009, Rafael J. Wysocki wrote: > On Tuesday 08 December 2009, Rafael J. Wysocki wrote: > > On Tuesday 08 December 2009, Rafael J. Wysocki wrote: > > > On Tuesday 08 December 2009, Linus Torvalds wrote: > > > > > > > > On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > > > > > > > > > > Anyway, if we use an rwsem, it won't be checkable from interrupt context just > > > > > as well. > > > > > > > > You can't do a lock() from an interrupt, but the unlocks should be > > > > irq-safe. > > > > > > > > > Suppose we use rwsem and during suspend each child uses a down_read() on a > > > > > parent and then the parent uses down_write() on itself. What if, whatever the > > > > > reason, the parent is a bit early and does the down_write() before one of the > > > > > children has a chance to do the down_read()? Aren't we toast? > > > > > > > > We're toast, but we're toast for a totally unrealted reason: it means that > > > > you tried to resume a child before a parent, which would be a major bug to > > > > begin with. > > > > > > > > Look, I even wrote out the comments, so let me repeat the code one more > > > > time. > > > > > > > > - suspend time calling: > > > > // This won't block, because we suspend nodes before parents > > > > down_read(node->parent->lock); > > > > // Do the part that may block asynchronously > > > > async_schedule(do_usb_node_suspend, node); > > > > > > > > - resume time calling: > > > > // This won't block, because we resume parents before children, > > > > // and the children will take the read lock. > > > > down_write(leaf->lock); > > > > // Do the blocking part asynchronously > > > > async_schedule(usb_node_resume, leaf); > > > > > > > > See? So when we take the parent lock for suspend, we are guaranteed to do > > > > so _before_ the parent node itself suspends. And conversely, when we take > > > > the parent lock (asynchronously) for resume, we're guaranteed to do that > > > > _after_ the parent node has done its own down_write. > > > > > > > > And that all depends on just one trivial thing; that the suspend and > > > > resume is called in the right order (children first vs parent first > > > > respectively). And that is such a _major_ correctness issue that if that > > > > isn't correct, your suspend isn't going to work _anyway_. > > > > > > Understood (I think). > > > > > > Let's try it, then. Below is the resume patch based on my previous one in this > > > thread (I have only verified that it builds). > > > > Ah, I need to check if dev->parent is not NULL before trying to lock it, but > > apart from this it doesn't break things at least. > > For completness, below is the full async suspend/resume patch with rwlocks, > that has been (very slightly) tested and doesn't seem to break things. > > [Note to Alan: lockdep doesn't seem to complain about the not annotated nested > locks.] BTW, I can easily change it so that it uses completions for synchronization, but I'm not sure if that's worth spending time on, so please let me know. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ rwsems (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-08 22:55 ` Async suspend-resume patch w/ rwsems " Rafael J. Wysocki 2009-12-08 23:24 ` Rafael J. Wysocki @ 2009-12-09 20:15 ` Alan Stern 2009-12-09 22:18 ` Rafael J. Wysocki 1 sibling, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-09 20:15 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > For completness, below is the full async suspend/resume patch with rwlocks, > that has been (very slightly) tested and doesn't seem to break things. > > [Note to Alan: lockdep doesn't seem to complain about the not annotated nested > locks.] I can't imagine why not. And wouldn't lockdep get confused by the fact that in the async case, the rwsems are released by a different process from the one that acquired them? > Index: linux-2.6/drivers/base/power/main.c > =================================================================== > --- linux-2.6.orig/drivers/base/power/main.c > +++ linux-2.6/drivers/base/power/main.c Should we have an attribute under /sys/power to disable async suspend/resume? It would make testing easier and give people a way to work around problems. > @@ -334,25 +337,53 @@ static void pm_dev_err(struct device *de > * The driver of @dev will not receive interrupts while this function is being > * executed. > */ > -static int device_resume_noirq(struct device *dev, pm_message_t state) > +static int __device_resume_noirq(struct device *dev, pm_message_t state) > { Do you want to use async tasks in the late-suspend/early-resume stages? I know that USB won't use it, not even for the PCI host controllers -- not unless the PCI core specifically wants it. Doing just the regular suspend/resume stages may be enough. > +static int device_resume_noirq(struct device *dev) > +{ > + down_write(&dev->power.rwsem); > + > + if (dev->power.async_suspend && !pm_trace_is_enabled()) { If the sysfs attribute exists, then maybe we _should_ allow async with PM tracing enabled. I don't know; it's your decision. atomic_set(&async_error, error); } > @@ -683,10 +835,12 @@ static int dpm_suspend(pm_message_t stat > > INIT_LIST_HEAD(&list); > mutex_lock(&dpm_list_mtx); > + pm_transition = state; > while (!list_empty(&dpm_list)) { > struct device *dev = to_device(dpm_list.prev); > > get_device(dev); > + dev->power.status = DPM_OFF; What's that for? dev->power.status is supposed to be DPM_SUSPENDING until the suspend method is successfully completed. > mutex_unlock(&dpm_list_mtx); > > error = device_suspend(dev, state); > @@ -694,16 +848,22 @@ static int dpm_suspend(pm_message_t stat > mutex_lock(&dpm_list_mtx); > if (error) { > pm_dev_err(dev, state, "", error); > + dev->power.status = DPM_SUSPENDING; And then this isn't needed. > put_device(dev); > break; > } > - dev->power.status = DPM_OFF; This line has to be moved into __device_suspend(), even though it won't be protected by dpm_list_mtx. The same sort of thing applies to dpm_suspend_noirq() (although nothing needs to be moved if you don't make it async). The rest looks okay. How about exporting a wait_for_device_to_resume() routine? Drivers could call it for non-tree resume constraints: void wait_for_device_to_resume(struct device *other) { down_read(&other->power.rwsem); up_read(&other->power.rwsem); } Unfortunately there is no equivalent for non-tree suspend constraints. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ rwsems (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-09 20:15 ` Alan Stern @ 2009-12-09 22:18 ` Rafael J. Wysocki 2009-12-09 22:38 ` Alan Stern 0 siblings, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-09 22:18 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wednesday 09 December 2009, Alan Stern wrote: > On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > > > For completness, below is the full async suspend/resume patch with rwlocks, > > that has been (very slightly) tested and doesn't seem to break things. > > > > [Note to Alan: lockdep doesn't seem to complain about the not annotated nested > > locks.] > > I can't imagine why not. And wouldn't lockdep get confused by the fact > that in the async case, the rwsems are released by a different process > from the one that acquired them? /me looks at the .config I have CONFIG_LOCKDEP_SUPPORT set, is there anything else I need to set in .config? > > Index: linux-2.6/drivers/base/power/main.c > > =================================================================== > > --- linux-2.6.orig/drivers/base/power/main.c > > +++ linux-2.6/drivers/base/power/main.c > > Should we have an attribute under /sys/power to disable async > suspend/resume? It would make testing easier and give people a way to > work around problems. I have a separate patch adding that, but I'd prefer to focus on the core feature first, if possible. > > @@ -334,25 +337,53 @@ static void pm_dev_err(struct device *de > > * The driver of @dev will not receive interrupts while this function is being > > * executed. > > */ > > -static int device_resume_noirq(struct device *dev, pm_message_t state) > > +static int __device_resume_noirq(struct device *dev, pm_message_t state) > > { > > Do you want to use async tasks in the late-suspend/early-resume stages? > I know that USB won't use it, not even for the PCI host controllers -- > not unless the PCI core specifically wants it. Doing just the regular > suspend/resume stages may be enough. I guess so. It's a leftover from the time I thought PCI might use async suspend, but it didn't really speed up things at all AFAICS. I think I'll remove it for now and it's going to be trivial to add it back if desired. > > +static int device_resume_noirq(struct device *dev) > > +{ > > + down_write(&dev->power.rwsem); > > + > > + if (dev->power.async_suspend && !pm_trace_is_enabled()) { > > If the sysfs attribute exists, then maybe we _should_ allow async with > PM tracing enabled. I don't know; it's your decision. I don't think it would be reliable in that case, because the RTC might be written to by two concurrent threads at the same time. > atomic_set(&async_error, error); > } > > > > @@ -683,10 +835,12 @@ static int dpm_suspend(pm_message_t stat > > > > INIT_LIST_HEAD(&list); > > mutex_lock(&dpm_list_mtx); > > + pm_transition = state; > > while (!list_empty(&dpm_list)) { > > struct device *dev = to_device(dpm_list.prev); > > > > get_device(dev); > > + dev->power.status = DPM_OFF; > > What's that for? dev->power.status is supposed to be DPM_SUSPENDING > until the suspend method is successfully completed. If the suspend is run asynchronoysly, the main thread will always get a "success" from device_suspend(), so it can't change power.status on this basis. I thought we could set power.status to DPM_OFF upfront and change it back when error is returned. The alternative would be to move the modification of power.status to device_suspend() and async_suspend(). Well, maybe that's better. > > mutex_unlock(&dpm_list_mtx); > > > > error = device_suspend(dev, state); > > @@ -694,16 +848,22 @@ static int dpm_suspend(pm_message_t stat > > mutex_lock(&dpm_list_mtx); > > if (error) { > > pm_dev_err(dev, state, "", error); > > + dev->power.status = DPM_SUSPENDING; > > And then this isn't needed. > > > put_device(dev); > > break; > > } > > - dev->power.status = DPM_OFF; > > This line has to be moved into __device_suspend(), even though it won't > be protected by dpm_list_mtx. The same sort of thing applies to > dpm_suspend_noirq() (although nothing needs to be moved if you don't > make it async). > > The rest looks okay. Still, I think I'd rework it to use completions for the reason described in the message I've just sent (in short, because of the off-tree dependencies problem). > How about exporting a wait_for_device_to_resume() routine? Drivers > could call it for non-tree resume constraints: > > void wait_for_device_to_resume(struct device *other) > { > down_read(&other->power.rwsem); > up_read(&other->power.rwsem); > } > > Unfortunately there is no equivalent for non-tree suspend constraints. If we use completions, it will be possible to just export something like dpm_wait(dev) { if (dev) wait_for_completion(dev->power.completion); } I think. It appears that will also work for suspend, unless I'm missing something. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ rwsems (was: Re: [GIT PULL] PM updates for 2.6.33) 2009-12-09 22:18 ` Rafael J. Wysocki @ 2009-12-09 22:38 ` Alan Stern 2009-12-09 23:18 ` Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) Rafael J. Wysocki 0 siblings, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-09 22:38 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wed, 9 Dec 2009, Rafael J. Wysocki wrote: > On Wednesday 09 December 2009, Alan Stern wrote: > > On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > > > > > For completness, below is the full async suspend/resume patch with rwlocks, > > > that has been (very slightly) tested and doesn't seem to break things. > > > > > > [Note to Alan: lockdep doesn't seem to complain about the not annotated nested > > > locks.] > > > > I can't imagine why not. And wouldn't lockdep get confused by the fact > > that in the async case, the rwsems are released by a different process > > from the one that acquired them? > > /me looks at the .config > > I have CONFIG_LOCKDEP_SUPPORT set, is there anything else I need to set > in .config? How about CONFIG_PROVE_LOCKING? If lockdep really does start complaining then switching to completions would be a simple way to appease it. > > > @@ -683,10 +835,12 @@ static int dpm_suspend(pm_message_t stat > > > > > > INIT_LIST_HEAD(&list); > > > mutex_lock(&dpm_list_mtx); > > > + pm_transition = state; > > > while (!list_empty(&dpm_list)) { > > > struct device *dev = to_device(dpm_list.prev); > > > > > > get_device(dev); > > > + dev->power.status = DPM_OFF; > > > > What's that for? dev->power.status is supposed to be DPM_SUSPENDING > > until the suspend method is successfully completed. > > If the suspend is run asynchronoysly, the main thread will always get a > "success" from device_suspend(), so it can't change power.status on this > basis. I thought we could set power.status to DPM_OFF upfront and change > it back when error is returned. > > The alternative would be to move the modification of power.status to > device_suspend() and async_suspend(). Well, maybe that's better. Yes, I think so. Or into __device_suspend(). And the same thing in dpm_suspend_noirq(). > > How about exporting a wait_for_device_to_resume() routine? Drivers > > could call it for non-tree resume constraints: > > > > void wait_for_device_to_resume(struct device *other) > > { > > down_read(&other->power.rwsem); > > up_read(&other->power.rwsem); > > } > > > > Unfortunately there is no equivalent for non-tree suspend constraints. > > If we use completions, it will be possible to just export something like > > dpm_wait(dev) > { > if (dev) > wait_for_completion(dev->power.completion); > } > > I think. It appears that will also work for suspend, unless I'm missing > something. It will. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-09 22:38 ` Alan Stern @ 2009-12-09 23:18 ` Rafael J. Wysocki 2009-12-10 2:51 ` Linus Torvalds 2009-12-10 15:31 ` Alan Stern 0 siblings, 2 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-09 23:18 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wednesday 09 December 2009, Alan Stern wrote: > On Wed, 9 Dec 2009, Rafael J. Wysocki wrote: > > > On Wednesday 09 December 2009, Alan Stern wrote: > > > On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > > > > > > > For completness, below is the full async suspend/resume patch with rwlocks, > > > > that has been (very slightly) tested and doesn't seem to break things. > > > > > > > > [Note to Alan: lockdep doesn't seem to complain about the not annotated nested > > > > locks.] > > > > > > I can't imagine why not. And wouldn't lockdep get confused by the fact > > > that in the async case, the rwsems are released by a different process > > > from the one that acquired them? > > > > /me looks at the .config > > > > I have CONFIG_LOCKDEP_SUPPORT set, is there anything else I need to set > > in .config? > > How about CONFIG_PROVE_LOCKING? If lockdep really does start > complaining then switching to completions would be a simple way to > appease it. Ah, that one is not set. I guess I'll try it later, although I've already decided to use completions anyway. ... > > > How about exporting a wait_for_device_to_resume() routine? Drivers > > > could call it for non-tree resume constraints: > > > > > > void wait_for_device_to_resume(struct device *other) > > > { > > > down_read(&other->power.rwsem); > > > up_read(&other->power.rwsem); > > > } > > > > > > Unfortunately there is no equivalent for non-tree suspend constraints. > > > > If we use completions, it will be possible to just export something like > > > > dpm_wait(dev) > > { > > if (dev) > > wait_for_completion(dev->power.completion); > > } > > > > I think. It appears that will also work for suspend, unless I'm missing > > something. > > It will. Completions it is, then. Additionally, I've removed the async support from the _noirq parts and moved the setting of power.status on suspend to __device_suspend(). The result is appended. Rafael --- drivers/base/power/main.c | 124 ++++++++++++++++++++++++++++++++++++++++--- include/linux/device.h | 6 ++ include/linux/pm.h | 12 ++++ include/linux/resume-trace.h | 7 ++ 4 files changed, 143 insertions(+), 6 deletions(-) Index: linux-2.6/include/linux/pm.h =================================================================== --- linux-2.6.orig/include/linux/pm.h +++ linux-2.6/include/linux/pm.h @@ -26,6 +26,7 @@ #include <linux/spinlock.h> #include <linux/wait.h> #include <linux/timer.h> +#include <linux/completion.h> /* * Callbacks for platform drivers to implement. @@ -412,9 +413,11 @@ struct dev_pm_info { pm_message_t power_state; unsigned int can_wakeup:1; unsigned int should_wakeup:1; + unsigned async_suspend:1; enum dpm_state status; /* Owned by the PM core */ #ifdef CONFIG_PM_SLEEP struct list_head entry; + struct completion completion; #endif #ifdef CONFIG_PM_RUNTIME struct timer_list suspend_timer; @@ -508,6 +511,13 @@ extern void __suspend_report_result(cons __suspend_report_result(__func__, fn, ret); \ } while (0) +extern int __dpm_wait(struct device *dev, void *ign); + +static inline void dpm_wait(struct device *dev) +{ + __dpm_wait(dev, NULL); +} + #else /* !CONFIG_PM_SLEEP */ #define device_pm_lock() do {} while (0) @@ -520,6 +530,8 @@ static inline int dpm_suspend_start(pm_m #define suspend_report_result(fn, ret) do {} while (0) +static inline void dpm_wait(struct device *dev) {} + #endif /* !CONFIG_PM_SLEEP */ /* How to reorder dpm_list after device_move() */ Index: linux-2.6/drivers/base/power/main.c =================================================================== --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -25,6 +25,7 @@ #include <linux/resume-trace.h> #include <linux/rwsem.h> #include <linux/interrupt.h> +#include <linux/async.h> #include "../base.h" #include "power.h" @@ -42,6 +43,7 @@ LIST_HEAD(dpm_list); static DEFINE_MUTEX(dpm_list_mtx); +static pm_message_t pm_transition; /* * Set once the preparation of devices for a PM transition has started, reset @@ -56,6 +58,7 @@ static bool transition_started; void device_pm_init(struct device *dev) { dev->power.status = DPM_ON; + init_completion(&dev->power.completion); pm_runtime_init(dev); } @@ -162,6 +165,39 @@ void device_pm_move_last(struct device * } /** + * __dpm_wait - Wait for a PM operation to complete. + * @dev: Device to wait for. + * @ign: This value is not used by the function. + */ +int __dpm_wait(struct device *dev, void *ign) +{ + if (dev) + wait_for_completion(&dev->power.completion); + return 0; +} +EXPORT_SYMBOL_GPL(__dpm_wait); + +static void dpm_wait_for_children(struct device *dev) +{ + device_for_each_child(dev, NULL, __dpm_wait); +} + +/** + * dpm_synchronize - Wait for PM callbacks of all devices to complete. + */ +static void dpm_synchronize(void) +{ + struct device *dev; + + async_synchronize_full(); + + mutex_lock(&dpm_list_mtx); + list_for_each_entry(dev, &dpm_list, power.entry) + INIT_COMPLETION(dev->power.completion); + mutex_unlock(&dpm_list_mtx); +} + +/** * pm_op - Execute the PM operation appropriate for given PM event. * @dev: Device to handle. * @ops: PM operations to choose from. @@ -381,17 +417,18 @@ void dpm_resume_noirq(pm_message_t state EXPORT_SYMBOL_GPL(dpm_resume_noirq); /** - * device_resume - Execute "resume" callbacks for given device. + * __device_resume - Execute "resume" callbacks for given device. * @dev: Device to handle. * @state: PM transition of the system being carried out. */ -static int device_resume(struct device *dev, pm_message_t state) +static int __device_resume(struct device *dev, pm_message_t state) { int error = 0; TRACE_DEVICE(dev); TRACE_RESUME(0); + dpm_wait(dev->parent); down(&dev->sem); if (dev->bus) { @@ -426,11 +463,34 @@ static int device_resume(struct device * } End: up(&dev->sem); + complete_all(&dev->power.completion); TRACE_RESUME(error); return error; } +static void async_resume(void *data, async_cookie_t cookie) +{ + struct device *dev = (struct device *)data; + int error; + + error = __device_resume(dev, pm_transition); + if (error) + pm_dev_err(dev, pm_transition, " async", error); + put_device(dev); +} + +static int device_resume(struct device *dev) +{ + if (dev->power.async_suspend && !pm_trace_is_enabled()) { + get_device(dev); + async_schedule(async_resume, dev); + return 0; + } + + return __device_resume(dev, pm_transition); +} + /** * dpm_resume - Execute "resume" callbacks for non-sysdev devices. * @state: PM transition of the system being carried out. @@ -444,6 +504,7 @@ static void dpm_resume(pm_message_t stat INIT_LIST_HEAD(&list); mutex_lock(&dpm_list_mtx); + pm_transition = state; while (!list_empty(&dpm_list)) { struct device *dev = to_device(dpm_list.next); @@ -454,7 +515,7 @@ static void dpm_resume(pm_message_t stat dev->power.status = DPM_RESUMING; mutex_unlock(&dpm_list_mtx); - error = device_resume(dev, state); + error = device_resume(dev); mutex_lock(&dpm_list_mtx); if (error) @@ -469,6 +530,7 @@ static void dpm_resume(pm_message_t stat } list_splice(&list, &dpm_list); mutex_unlock(&dpm_list_mtx); + dpm_synchronize(); } /** @@ -533,6 +595,8 @@ static void dpm_complete(pm_message_t st mutex_unlock(&dpm_list_mtx); } +static atomic_t async_error; + /** * dpm_resume_end - Execute "resume" callbacks and complete system transition. * @state: PM transition of the system being carried out. @@ -628,10 +692,11 @@ EXPORT_SYMBOL_GPL(dpm_suspend_noirq); * @dev: Device to handle. * @state: PM transition of the system being carried out. */ -static int device_suspend(struct device *dev, pm_message_t state) +static int __device_suspend(struct device *dev, pm_message_t state) { int error = 0; + dpm_wait_for_children(dev); down(&dev->sem); if (dev->class) { @@ -666,12 +731,50 @@ static int device_suspend(struct device suspend_report_result(dev->bus->suspend, error); } } + + if (!error) + dev->power.status = DPM_OFF; + End: up(&dev->sem); + complete_all(&dev->power.completion); return error; } +static void async_suspend(void *data, async_cookie_t cookie) +{ + struct device *dev = (struct device *)data; + int error = atomic_read(&async_error); + + if (error) { + complete_all(&dev->power.completion); + goto End; + } + + error = __device_suspend(dev, pm_transition); + if (error) { + pm_dev_err(dev, pm_transition, " async", error); + atomic_set(&async_error, error); + } + + End: + put_device(dev); +} + +static int device_suspend(struct device *dev, pm_message_t state) +{ + int error; + + if (dev->power.async_suspend) { + get_device(dev); + async_schedule(async_suspend, dev); + return 0; + } + + return __device_suspend(dev, pm_transition); +} + /** * dpm_suspend - Execute "suspend" callbacks for all non-sysdev devices. * @state: PM transition of the system being carried out. @@ -683,6 +786,7 @@ static int dpm_suspend(pm_message_t stat INIT_LIST_HEAD(&list); mutex_lock(&dpm_list_mtx); + pm_transition = state; while (!list_empty(&dpm_list)) { struct device *dev = to_device(dpm_list.prev); @@ -697,13 +801,18 @@ static int dpm_suspend(pm_message_t stat put_device(dev); break; } - dev->power.status = DPM_OFF; if (!list_empty(&dev->power.entry)) list_move(&dev->power.entry, &list); put_device(dev); + error = atomic_read(&async_error); + if (error) + break; } list_splice(&list, dpm_list.prev); mutex_unlock(&dpm_list_mtx); + dpm_synchronize(); + if (!error) + error = atomic_read(&async_error); return error; } @@ -762,6 +871,7 @@ static int dpm_prepare(pm_message_t stat INIT_LIST_HEAD(&list); mutex_lock(&dpm_list_mtx); transition_started = true; + atomic_set(&async_error, 0); while (!list_empty(&dpm_list)) { struct device *dev = to_device(dpm_list.next); @@ -793,8 +903,10 @@ static int dpm_prepare(pm_message_t stat break; } dev->power.status = DPM_SUSPENDING; - if (!list_empty(&dev->power.entry)) + if (!list_empty(&dev->power.entry)) { list_move_tail(&dev->power.entry, &list); + INIT_COMPLETION(dev->power.completion); + } put_device(dev); } list_splice(&list, &dpm_list); Index: linux-2.6/include/linux/resume-trace.h =================================================================== --- linux-2.6.orig/include/linux/resume-trace.h +++ linux-2.6/include/linux/resume-trace.h @@ -6,6 +6,11 @@ extern int pm_trace_enabled; +static inline int pm_trace_is_enabled(void) +{ + return pm_trace_enabled; +} + struct device; extern void set_trace_device(struct device *); extern void generate_resume_trace(const void *tracedata, unsigned int user); @@ -17,6 +22,8 @@ extern void generate_resume_trace(const #else +static inline int pm_trace_is_enabled(void) { return 0; } + #define TRACE_DEVICE(dev) do { } while (0) #define TRACE_RESUME(dev) do { } while (0) Index: linux-2.6/include/linux/device.h =================================================================== --- linux-2.6.orig/include/linux/device.h +++ linux-2.6/include/linux/device.h @@ -472,6 +472,12 @@ static inline int device_is_registered(s return dev->kobj.state_in_sysfs; } +static inline void device_enable_async_suspend(struct device *dev, bool enable) +{ + if (dev->power.status == DPM_ON) + dev->power.async_suspend = enable; +} + void driver_init(void); /* ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-09 23:18 ` Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) Rafael J. Wysocki @ 2009-12-10 2:51 ` Linus Torvalds 2009-12-10 19:40 ` Rafael J. Wysocki 2009-12-10 15:31 ` Alan Stern 1 sibling, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-10 2:51 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Thu, 10 Dec 2009, Rafael J. Wysocki wrote: > > Completions it is, then. What was so hard with the "Try the simple one first" to understand? You had a simpler working patch, why are you making this more complex one without ever having had any problems with the simpler one? Btw, your 'atomic_set()' with errors is pure voodoo programming. That's not how atomics work. They do SMP-atomic addition etc, the 'atomic_set()' and 'atomic_read()' things are not in any way more atomic than any other access. They are meant for racy reads (atomic_read()) and for initializations (atomic_set()), and the way you use them that 'atomic' part is entirely pointless, because it really isn't anything different from an 'int', except that it may be very very expensive on some architectures due to hashed spinlocks etc. So stop this overdesign thing. Start simple. If you _ever_ see real problems, that's when you add stuff. As it is, any time you add complexity, you just add bugs. > +/** > + * dpm_synchronize - Wait for PM callbacks of all devices to complete. > + */ > +static void dpm_synchronize(void) > +{ > + struct device *dev; > + > + async_synchronize_full(); > + > + mutex_lock(&dpm_list_mtx); > + list_for_each_entry(dev, &dpm_list, power.entry) > + INIT_COMPLETION(dev->power.completion); > + mutex_unlock(&dpm_list_mtx); > +} And this, for example, is pretty disgusting. Not only is that INIT_COMPLETION purely brought on by the whole problem with completions (they are fundamentally one-shot, but you want to use them over and over so you need to re-initialize them: a nice lock wouldn't have that problem to begin with), but the comment isn't even accurate. Sure, it waits for any async jobs, but that's the _least_ of what the function actually does, so the comment is actively misleading, isn't it? Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-10 2:51 ` Linus Torvalds @ 2009-12-10 19:40 ` Rafael J. Wysocki 2009-12-10 23:30 ` Linus Torvalds 0 siblings, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-10 19:40 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Thursday 10 December 2009, Linus Torvalds wrote: > > On Thu, 10 Dec 2009, Rafael J. Wysocki wrote: > > > > Completions it is, then. > > What was so hard with the "Try the simple one first" to understand? You > had a simpler working patch, why are you making this more complex one > without ever having had any problems with the simpler one? OK, why don't you just say you won't merge anything that doesn't use rwsems (although you said before that completions would be fine with you)? That would make things clear, but also it would mean we gave up handling the off-tree dependencies in general. > Btw, your 'atomic_set()' with errors is pure voodoo programming. That's > not how atomics work. They do SMP-atomic addition etc, the 'atomic_set()' > and 'atomic_read()' things are not in any way more atomic than any other > access. > > They are meant for racy reads (atomic_read()) and for initializations > (atomic_set()), and the way you use them that 'atomic' part is entirely > pointless, because it really isn't anything different from an 'int', > except that it may be very very expensive on some architectures due to > hashed spinlocks etc. > > So stop this overdesign thing. Start simple. If you _ever_ see real > problems, that's when you add stuff. As it is, any time you add > complexity, you just add bugs. OK, so that need not be atomic. > > +/** > > + * dpm_synchronize - Wait for PM callbacks of all devices to complete. > > + */ > > +static void dpm_synchronize(void) > > +{ > > + struct device *dev; > > + > > + async_synchronize_full(); > > + > > + mutex_lock(&dpm_list_mtx); > > + list_for_each_entry(dev, &dpm_list, power.entry) > > + INIT_COMPLETION(dev->power.completion); > > + mutex_unlock(&dpm_list_mtx); > > +} > > And this, for example, is pretty disgusting. Not only is that > INIT_COMPLETION purely brought on by the whole problem with completions > (they are fundamentally one-shot, but you want to use them over and over Actually, twice. However, since I don't want to do any async handling in the _noirq phases any more, I can get rid of this whole function. Thanks for pointing that out to me. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-10 19:40 ` Rafael J. Wysocki @ 2009-12-10 23:30 ` Linus Torvalds 2009-12-11 1:02 ` Rafael J. Wysocki 0 siblings, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-10 23:30 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Thu, 10 Dec 2009, Rafael J. Wysocki wrote: > > OK, why don't you just say you won't merge anything that doesn't use rwsems I did! Here's a quote (and it's pretty much the whole email, so it's not like it was hidden): - alpine.LFD.2.00.0912081309370.3560@localhost.localdomain: "Let me put this simply: I've told you guys how to do it simply, with _zero_ crap. No "iterating over children". No games. No data structures. No new infrastructure. Just a single new rwlock per device, and _trivial_ code. So here's the challenge: try it my simple way first. I've quoted the code about five million times already. If you _actually_ see some problems, explain them. Don't make up stupid "iterate over each child" things. Don't claim totally made-up "leads to difficulties". Don't make it any more complicated than it needs to be. Keep it simple. And once you have tried that simple approach, and you really can show why it doesn't work, THEN you can try something else. But before you try the simple approach and explain why it wouldn't work, I simply will not pull anything more complex. Understood and agreed?" And then later about completions: - alpine.LFD.2.00.0912081416470.3560@localhost.localdomain: "So I think completions should work, if done right. That whole "make the parent wait for all the children to complete" is fine in that sense. And I'll happily take such an approach if my rwlock thing doesn't work." IOW, I'll happily take the completions version, but dammit, I refuse to take it when there is a simpler approach that does NOT need to iterate, and does NOT need to re-initialize the data structures each round etc. That's what I've been arguing against the whole time. It started as arguing against complex and unnecessary infrastructure, and trying to show that it _can_ be done so much simpler using existing basic locking. And I get annoyed when you guys continually seem to want to make it more complex than it needs to be. > > And this, for example, is pretty disgusting. Not only is that > > INIT_COMPLETION purely brought on by the whole problem with completions > > (they are fundamentally one-shot, but you want to use them over and over > > Actually, twice. However, since I don't want to do any async handling in the > _noirq phases any more, I can get rid of this whole function. Thanks for > pointing that out to me. Well, my point was that you'll need to do that INIT_COMPLETION(dev->power.completion); thing each suspend and each resume. Exactly because completions are designed to be "onw-way" things, so you end up having to reset them each cycle (you just reset them even _more_ than you needed). Again, my point was that using locks is actually a very _natural_ thing to do. I really don't understand what problems you and Alan have with just using locks - we have way more locks in the kernel than we have completions, so they are the "default" thing to do, and they really are very natural to use. [ Ok, so admittedly the actual use of 'struct rw_semaphore' is pretty unusual, but my point is that people are used to locking semantics in general, more so than the semantics of completions ] Completions were literally designed to be used for one-off things - one of the most common uses is that the 'struct completion' is on the _stack_. It doesn't get much more one-off than that - and the completions are really very explicitly designed so that you can do a 'complete()' on something that will literally disappear from under you as you do it (because the struct completion might be on the stack of the thing that is waiting for it, and gets de-allocated when the waiter goes ahead). That is why 'wait_for_completion()' always has to take the spinlock, for example - there is no fastpath for completion, because the races for the waiter releasing things too early are too nasty. So completions are actually very subtle things - and you don't need any of that subtlety. I realize that from a user perspective, completions look very simple, but in many ways they actually have subtler semantics than a regular lock has. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-10 23:30 ` Linus Torvalds @ 2009-12-11 1:02 ` Rafael J. Wysocki 2009-12-11 1:25 ` Linus Torvalds 0 siblings, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-11 1:02 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Friday 11 December 2009, Linus Torvalds wrote: > > On Thu, 10 Dec 2009, Rafael J. Wysocki wrote: ... > > IOW, I'll happily take the completions version, but dammit, I refuse to > take it when there is a simpler approach that does NOT need to iterate, > and does NOT need to re-initialize the data structures each round etc. I don't think it really is that simple. For example, the fact that the outer lock has to be taken by one thread and released by another is not exactly straightforward. [One might ask what's the critical section in this case.] Besides, suppose a device driver wants some off-tree constraints to be satisfied. What's the driver writer supposed to do? He only can lock the other device, but that will cause lockdep to complain, because this lock is going to be nested. Moreover, it's already too late, because his async thread has started and there's no guarantee that the other device hasn't acquired its rwsem yet. With completions, the driver doesn't have to take any action to prevent another one from suspending too early. Instead, the other one has to wait for its suspend to complete, and for me personally this is a much more natural thing to do. IOW, if I were a driver writed, I'd probably prefer to wait on a completion than to use a lock in a tricky manner. > That's what I've been arguing against the whole time. It started as > arguing against complex and unnecessary infrastructure, and trying to show > that it _can_ be done so much simpler using existing basic locking. > > And I get annoyed when you guys continually seem to want to make it more > complex than it needs to be. > > > > And this, for example, is pretty disgusting. Not only is that > > > INIT_COMPLETION purely brought on by the whole problem with completions > > > (they are fundamentally one-shot, but you want to use them over and over > > > > Actually, twice. However, since I don't want to do any async handling in the > > _noirq phases any more, I can get rid of this whole function. Thanks for > > pointing that out to me. > > Well, my point was that you'll need to do that > > INIT_COMPLETION(dev->power.completion); > > thing each suspend and each resume. Exactly because completions are > designed to be "onw-way" things, so you end up having to reset them each > cycle (you just reset them even _more_ than you needed). Well, why actually do we need to preserve the state of the data structure from one cycle to another? There's no need whatsoever. > Again, my point was that using locks is actually a very _natural_ thing to > do. I really don't understand what problems you and Alan have with just > using locks - we have way more locks in the kernel than we have > completions, so they are the "default" thing to do, and they really are > very natural to use. > > [ Ok, so admittedly the actual use of 'struct rw_semaphore' is pretty > unusual, but my point is that people are used to locking semantics in > general, more so than the semantics of completions ] I still don't think there are many places where locks are used in a way you're suggesting. I would even say it's quite unusual to use locks this way. > Completions were literally designed to be used for one-off things - one of > the most common uses is that the 'struct completion' is on the _stack_. It > doesn't get much more one-off than that - and the completions are really > very explicitly designed so that you can do a 'complete()' on something > that will literally disappear from under you as you do it (because the > struct completion might be on the stack of the thing that is waiting for > it, and gets de-allocated when the waiter goes ahead). We could literally throw away a completion after all of the potentially waiting threads have finished their operations and then allocate it back again when necessary. We only need the synchronization in this particular phase of suspend or resume and it doesn't need to extend to the other phases or other cycles, because all of the concurrent threads we need to synchronize will only live during this one particular phase of suspend or resume. They will all exit when it's finished anyway. > That is why 'wait_for_completion()' always has to take the spinlock, for > example - there is no fastpath for completion, because the races for the > waiter releasing things too early are too nasty. > > So completions are actually very subtle things - and you don't need any of > that subtlety. I realize that from a user perspective, completions look > very simple, but in many ways they actually have subtler semantics than a > regular lock has. Well, I guess your point is that the implementation of completions is much more complicated that we really need, but I'm not sure if that really hurts. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-11 1:02 ` Rafael J. Wysocki @ 2009-12-11 1:25 ` Linus Torvalds 2009-12-11 3:42 ` Alan Stern 2009-12-11 22:11 ` Rafael J. Wysocki 0 siblings, 2 replies; 235+ messages in thread From: Linus Torvalds @ 2009-12-11 1:25 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Fri, 11 Dec 2009, Rafael J. Wysocki wrote: > > I don't think it really is that simple. For example, the fact that the outer > lock has to be taken by one thread and released by another is not exactly > straightforward. [One might ask what's the critical section in this case.] Why is that any different from initializing the completion in one thread, and completing it in another? It's exactly equivalent. Completions really are "locks that were initialized to locked". That is, in fact, how completions came to be: we literally used to use semaphores for them, and the reason for completions is literally the magic lifetime rules they have. So when you do INIT_COMPLETION(dev->power.completion); that really is historically, logically, and conceptually exactly the same thing as initializing a lock to the locked state. We literally used to do it with the equivalent of init_MUTEX_LOCKED() way back when (well, except we didn't have mutexes back then, we had only counting semaphores) and instead of "complete()", we had "up()" on the semaphore to complete it. > Besides, suppose a device driver wants some off-tree constraints to be > satisfied. .. and I've told you several times that we should simply not do such devices asynchronously. At least not unless there is some _overriding_ reason to. And so far, nobody has suggested anything even remotely likely for that. Again - KISS: Keep It Simple, Stupid! Don't try to make up problems. The _only_ subsystem we know wants this is USB, and we know USB is purely a tree. > > INIT_COMPLETION(dev->power.completion); > > > > thing each suspend and each resume. Exactly because completions are > > designed to be "onw-way" things, so you end up having to reset them each > > cycle (you just reset them even _more_ than you needed). > > Well, why actually do we need to preserve the state of the data structure from > one cycle to another? There's no need whatsoever. My point is, with locks, none of that is necessary. Because they automatically do the right thing. By picking the right concept, you don't have any of those "oh, we need to re-initialize things" issues. They just work. > I still don't think there are many places where locks are used in a way you're > suggesting. I would even say it's quite unusual to use locks this way. See above. It's what completions _are_. > Well, I guess your point is that the implementation of completions is much > more complicated that we really need, but I'm not sure if that really hurts. No. The implementation of completions is actually pretty simple, exactly because they have that spinlock that is required to protect them. That wasn't the point. The point was that locks are actually the "normal" thing to use. You are arguing as if completions are somehow the simpler model. That's simply not true. Completions are just a _special_case_of_locking_. So why not just use regular locks instead, when it's actually the natural way to do it, and results in simpler code? Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-11 1:25 ` Linus Torvalds @ 2009-12-11 3:42 ` Alan Stern 2009-12-11 22:17 ` Rafael J. Wysocki 2009-12-11 22:11 ` Rafael J. Wysocki 1 sibling, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-11 3:42 UTC (permalink / raw) To: Linus Torvalds Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list Up front: This is my personal view of the matter. Which probably isn't of much interest to anybody, so I won't bother to defend these views or comment any further on them. The decision about what version to use is up to the two of you. The fact is, either implementation would get the job done. On Thu, 10 Dec 2009, Linus Torvalds wrote: > Completions really are "locks that were initialized to locked". That is, > in fact, how completions came to be: we literally used to use semaphores > for them, and the reason for completions is literally the magic lifetime > rules they have. > > So when you do > > INIT_COMPLETION(dev->power.completion); > > that really is historically, logically, and conceptually exactly the same > thing as initializing a lock to the locked state. We literally used to do > it with the equivalent of > > init_MUTEX_LOCKED() > > way back when (well, except we didn't have mutexes back then, we had only > counting semaphores) and instead of "complete()", we had "up()" on the > semaphore to complete it. You think of it that way because you have been closely involved in the development of the various kinds of locks. Speaking as an outsider who has relatively little interest in the internal details, completions appear simpler than rwsems. Mostly because they have a smaller API: complete() (or complete_all()) and wait_for_completion() as opposed to down_read(), up_read(), down_write(), and up_write(). > > Besides, suppose a device driver wants some off-tree constraints to be > > satisfied. > > .. and I've told you several times that we should simply not do such > devices asynchronously. At least not unless there is some _overriding_ > reason to. And so far, nobody has suggested anything even remotely > likely for that. Agreed. The fact that async non-tree suspend constraints are difficult with rwsems isn't a drawback if nobody needs to use them. > > Well, why actually do we need to preserve the state of the data structure from > > one cycle to another? There's no need whatsoever. > > My point is, with locks, none of that is necessary. Because they > automatically do the right thing. > > By picking the right concept, you don't have any of those "oh, we need to > re-initialize things" issues. They just work. That's true, but it's not entirely clear. There are subtle questions about what happens if you stop in the middle or a device gets unregistered or registered in the middle. They require careful thought in both approaches. Having to reinitialize a completion each time doesn't bother me. It's merely an indication that each suspend & resume is independent of all the others. > > I still don't think there are many places where locks are used in a way you're > > suggesting. I would even say it's quite unusual to use locks this way. > > See above. It's what completions _are_. This is almost a philosophical issue. If each A_i must wait for some B_j's, is the onus on each A_i to test the B_j's it's interested in? Or is the onus on each B_j to tell the A_i's waiting for it that they may proceed? As Humpty-Dumpty said, "The question is which is to be master -- that's all". > > Well, I guess your point is that the implementation of completions is much > > more complicated that we really need, but I'm not sure if that really hurts. > > No. The implementation of completions is actually pretty simple, exactly > because they have that spinlock that is required to protect them. > > That wasn't the point. The point was that locks are actually the "normal" > thing to use. > > You are arguing as if completions are somehow the simpler model. That's > simply not true. Completions are just a _special_case_of_locking_. Doesn't that make them simpler by definition? Special cases always have less to worry about than the general case. > So why not just use regular locks instead, when it's actually the natural > way to do it, and results in simpler code? Simpler but also more subtle, IMO. If you didn't already know how the algorithm worked, figuring it out from the code would be harder with rwsems than with completions. Partly because of the way readers and writers exchange roles in suspend vs. resume, and partly because sometimes devices lock themselves and sometimes they lock other devices. With completions each device has its own, and each device waits for other devices' completions -- easier to keep track of mentally. (I still think this whole readers vs. writers thing is a red herring. The essential property is that there are two opposing classes of lock holders. The fact that multiple writers can't hold the lock at the same time whereas multiple readers can is of no importance; the algorithm would work just as well if multiple writers _could_ hold the lock simultaneously.) Balancing the additional conceptual complexity of the rwsem approach is the conceptual simplicity afforded by not needing to check all the children. To me this makes it pretty much a toss-up. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-11 3:42 ` Alan Stern @ 2009-12-11 22:17 ` Rafael J. Wysocki 2009-12-12 0:38 ` Alan Stern 0 siblings, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-11 22:17 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Friday 11 December 2009, Alan Stern wrote: > Up front: This is my personal view of the matter. Which probably isn't > of much interest to anybody, so I won't bother to defend these views or > comment any further on them. The decision about what version to use is > up to the two of you. The fact is, either implementation would get the > job done. > > On Thu, 10 Dec 2009, Linus Torvalds wrote: > > > Completions really are "locks that were initialized to locked". That is, > > in fact, how completions came to be: we literally used to use semaphores > > for them, and the reason for completions is literally the magic lifetime > > rules they have. > > > > So when you do > > > > INIT_COMPLETION(dev->power.completion); > > > > that really is historically, logically, and conceptually exactly the same > > thing as initializing a lock to the locked state. We literally used to do > > it with the equivalent of > > > > init_MUTEX_LOCKED() > > > > way back when (well, except we didn't have mutexes back then, we had only > > counting semaphores) and instead of "complete()", we had "up()" on the > > semaphore to complete it. > > You think of it that way because you have been closely involved in the > development of the various kinds of locks. Speaking as an outsider who > has relatively little interest in the internal details, completions > appear simpler than rwsems. Mostly because they have a smaller API: > complete() (or complete_all()) and wait_for_completion() as opposed to > down_read(), up_read(), down_write(), and up_write(). Agreed. > > > Besides, suppose a device driver wants some off-tree constraints to be > > > satisfied. > > > > .. and I've told you several times that we should simply not do such > > devices asynchronously. At least not unless there is some _overriding_ > > reason to. And so far, nobody has suggested anything even remotely > > likely for that. > > Agreed. The fact that async non-tree suspend constraints are difficult > with rwsems isn't a drawback if nobody needs to use them. Well, see my reply to Linus. The only thing that bothers me is that if we use rwsems, there's no way to handle that even if it turns out that someone needs them after all. > > > Well, why actually do we need to preserve the state of the data structure from > > > one cycle to another? There's no need whatsoever. > > > > My point is, with locks, none of that is necessary. Because they > > automatically do the right thing. > > > > By picking the right concept, you don't have any of those "oh, we need to > > re-initialize things" issues. They just work. > > That's true, but it's not entirely clear. There are subtle questions > about what happens if you stop in the middle or a device gets > unregistered or registered in the middle. They require careful thought > in both approaches. > > Having to reinitialize a completion each time doesn't bother me. It's > merely an indication that each suspend & resume is independent of all > the others. YES! > > > I still don't think there are many places where locks are used in a way you're > > > suggesting. I would even say it's quite unusual to use locks this way. > > > > See above. It's what completions _are_. > > This is almost a philosophical issue. If each A_i must wait for some > B_j's, is the onus on each A_i to test the B_j's it's interested in? > Or is the onus on each B_j to tell the A_i's waiting for it that they > may proceed? As Humpty-Dumpty said, "The question is which is to be > master -- that's all". Agreed. > > > Well, I guess your point is that the implementation of completions is much > > > more complicated that we really need, but I'm not sure if that really hurts. > > > > No. The implementation of completions is actually pretty simple, exactly > > because they have that spinlock that is required to protect them. > > > > That wasn't the point. The point was that locks are actually the "normal" > > thing to use. > > > > You are arguing as if completions are somehow the simpler model. That's > > simply not true. Completions are just a _special_case_of_locking_. > > Doesn't that make them simpler by definition? Special cases always > have less to worry about than the general case. Heh, good point. > > So why not just use regular locks instead, when it's actually the natural > > way to do it, and results in simpler code? > > Simpler but also more subtle, IMO. If you didn't already know how the > algorithm worked, figuring it out from the code would be harder with > rwsems than with completions. Indeed. > Partly because of the way readers and > writers exchange roles in suspend vs. resume, and partly because > sometimes devices lock themselves and sometimes they lock other > devices. With completions each device has its own, and each device > waits for other devices' completions -- easier to keep track of > mentally. Agreed again. > (I still think this whole readers vs. writers thing is a red herring. > The essential property is that there are two opposing classes of lock > holders. The fact that multiple writers can't hold the lock at the > same time whereas multiple readers can is of no importance; the > algorithm would work just as well if multiple writers _could_ hold the > lock simultaneously.) > > Balancing the additional conceptual complexity of the rwsem approach is > the conceptual simplicity afforded by not needing to check all the > children. To me this makes it pretty much a toss-up. Yup. Thanks! Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-11 22:17 ` Rafael J. Wysocki @ 2009-12-12 0:38 ` Alan Stern 0 siblings, 0 replies; 235+ messages in thread From: Alan Stern @ 2009-12-12 0:38 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Fri, 11 Dec 2009, Rafael J. Wysocki wrote: > > > .. and I've told you several times that we should simply not do such > > > devices asynchronously. At least not unless there is some _overriding_ > > > reason to. And so far, nobody has suggested anything even remotely > > > likely for that. > > > > Agreed. The fact that async non-tree suspend constraints are difficult > > with rwsems isn't a drawback if nobody needs to use them. > > Well, see my reply to Linus. The only thing that bothers me is that if we use > rwsems, there's no way to handle that even if it turns out that someone > needs them after all. This is now a totally moot point, but I want to make it anyway just to show how perverse life can be. It turns out that by combining some of the worst parts of the rwsem approach and the completion approach, it _is_ possible to have async non-tree suspend constraints with rwsems. The key is to imitate the way the completions work. The resume algorithm doesn't change, but the suspend algorithm does. Currently, when suspending a device you first read-lock the parent (to prevent it from suspending too soon), then you asynchronously write-lock the device and suspend it, and finally read-unlock the parent. Instead, you could first write-lock the device (to prevent the parent and any other dependents from suspending too soon), then asynchronously read-lock each of the children and anything else the device needs to wait for, then suspend the device, and finally write-unlock it. This really is analogous to completions: down_write() is like init_completion(), up_write() is like complete_all(), and down_read()+up_read() is like wait_for_completion(). I got the idea from Linus's comment that completions really are nothing but locks initialized in the "locked" state. Of course, you would have to iterate over all the children and deal with lockdep complaints. So this obviously is not to be considered as a serious proposal. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-11 1:25 ` Linus Torvalds 2009-12-11 3:42 ` Alan Stern @ 2009-12-11 22:11 ` Rafael J. Wysocki 2009-12-11 22:31 ` Linus Torvalds 1 sibling, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-11 22:11 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Friday 11 December 2009, Linus Torvalds wrote: > > On Fri, 11 Dec 2009, Rafael J. Wysocki wrote: > > > > I don't think it really is that simple. For example, the fact that the outer > > lock has to be taken by one thread and released by another is not exactly > > straightforward. [One might ask what's the critical section in this case.] > > Why is that any different from initializing the completion in one thread, > and completing it in another? > > It's exactly equivalent. > > Completions really are "locks that were initialized to locked". That is, > in fact, how completions came to be: we literally used to use semaphores > for them, and the reason for completions is literally the magic lifetime > rules they have. I don't know how they emerged historically and that's why I look a them in a different way than you do, probably. But fine, say we use the approach based on rwsems and consider suspend and the inner lock. We acquire it using down_write(), because we want to wait for multiple other dirvers. Now, in fact we could do literally down_write(dev->power.rwsem); up_write(dev->power.rwsem); because the lock doesn't really protect anything from anyone. What it does is to prevent _us_ from doing something too early. To me, personally, it's not a usual use of locks. Moreover, if you think completions should be treated like locks, the up_write() above plays the role of the INIT_COMPLETION() in my last patch (or vice versa), so we reinitialize the data structure to the previous state in this case too, only earlier (and we could do that later just as well). The only real drawback of using completions I can see is that we have to iterate over the children during suspend, but if async suspend is going to save us any time at all, we can easily afford it (resume with completions is actually simpler than with rwsems, because we only have to wait for one device each time). > > Besides, suppose a device driver wants some off-tree constraints to be > > satisfied. > > .. and I've told you several times that we should simply not do such > devices asynchronously. At least not unless there is some _overriding_ > reason to. And so far, nobody has suggested anything even remotely > likely for that. > > Again - KISS: Keep It Simple, Stupid! > > Don't try to make up problems. The _only_ subsystem we know wants this is > USB, and we know USB is purely a tree. Not really. I've already said it once, but let me repeat. Some device objects have those ACPI "shadow" device objects that represent the ACPI view of given "physical" device and have their own suspend and resume routines. It turns out that these ACPI "shadow" devices have to be suspended after their "physical" counterparts and resumed before them, or else things beak really badly. I don't know the reason for that, I only verified it experimentally (I also don't like that design, but I didn't invent it and I have to live with it at least for now). So if we don't enforce these constraints doing async suspend and resume, we won't be able to handle _any_ devices with those ACPI "shadow" things asynchronously. Ever. [That includes the majority PCI devices, at least the "planar" ones (which is unfortunate, but that's how it goes).] If we had a clean way of representing off-tree constraints during asynchronous suspend and resume, we'd be able to handle this issue at the bus type level. And even if we don't anticipate it right now, I think the iteration over children during suspend is a fair price for a clean interface that bus types or drivers can use in future. YMMV. > > Well, I guess your point is that the implementation of completions is much > > more complicated that we really need, but I'm not sure if that really hurts. > > No. The implementation of completions is actually pretty simple, exactly > because they have that spinlock that is required to protect them. > > That wasn't the point. The point was that locks are actually the "normal" > thing to use. > > You are arguing as if completions are somehow the simpler model. That's because I think so. > That's simply not true. Completions are just a _special_case_of_locking_. Which doesn't necessarily prevent them from being conceptually simpler that the locking scheme based on rwsems. > So why not just use regular locks instead, when it's actually the natural > way to do it, and results in simpler code? Well, to me, it's way not natural and, quite frankly, in my not so humble opinion, it's a matter of personal preference. But, since your personal preference is what matters in this case, I'm not going to argue any more, because that just plain doesn't make sense. So, if you're not fine with the last patch I sent (http://patchwork.kernel.org/patch/66375/), I'll send one using rwsems instead of completions just to make _you_ happy, not because I think that's what we should do objectively. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-11 22:11 ` Rafael J. Wysocki @ 2009-12-11 22:31 ` Linus Torvalds 2009-12-11 23:48 ` Rafael J. Wysocki 0 siblings, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-11 22:31 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Fri, 11 Dec 2009, Rafael J. Wysocki wrote: > > But fine, say we use the approach based on rwsems and consider suspend and > the inner lock. We acquire it using down_write(), because we want to wait for > multiple other dirvers. Now, in fact we could do literally > > down_write(dev->power.rwsem); > up_write(dev->power.rwsem); > > because the lock doesn't really protect anything from anyone. What it does is > to prevent _us_ from doing something too early. To me, personally, it's not a > usual use of locks. I agree that it's fairly unusual, but on the other hand, it's unusual only because you contrieved it to be. If you instead do down_write(dev->power.rwsem); .. do the actual suspend .. up_write(dev->power.rwsem); it doesn't look odd any more, does it? And while you don't _need_ to hold the power lock over the suspend call, it actually does make sense, and gives you some nicer guarantees. For an example of the kinds of guarantees it would give you - I think that you might actually be able to do a partial suspend and then a resume without any other locks, and you'd know that just the per-device locking would already guarantee that no device is ever tried to resume before it has finished its asynchronous suspend. Think about it. In the completion model, the "async_synchronize_full()" will synchronize all async work, and as a result you think that you don't need that level of robustness from the locking itself. But think about it this way: if you could abort a failed suspend, and start resuming devices immediately, without doing that "async_synchronize_full()" in between - simply because you know that the node locking itself will just "do the right thing". To me, that's a sign of a _good_ design. Using a rwsem is simply just more robust and natural for the problem in question. Exactly because it's a real lock. > > Don't try to make up problems. The _only_ subsystem we know wants this is > > USB, and we know USB is purely a tree. > > Not really. > > I've already said it once, but let me repeat. Some device objects have those > ACPI "shadow" device objects that represent the ACPI view of given "physical" > device and have their own suspend and resume routines. It turns out that > these ACPI "shadow" devices have to be suspended after their "physical" > counterparts and resumed before them, or else things beak really badly. > I don't know the reason for that, I only verified it experimentally (I also > don't like that design, but I didn't invent it and I have to live with it at > least for now). So if we don't enforce these constraints doing async > suspend and resume, we won't be able to handle _any_ devices with those > ACPI "shadow" things asynchronously. Ever. [That includes the majority > PCI devices, at least the "planar" ones (which is unfortunate, but that's how > it goes).] So? First off, you're wrong. It's not "ever". I'm happy to add complexity later, I just don't want to start out with a complex model. Adding complexity too early "just because we migth need it" is the wrong thing to do. Secondly, I repeat: we don't want to do those PCI devices asynchronously anyway. You're again digging yourself deeper by just continually bringing up this total non-issue. I realize you did it for testing, but I'm serious when I say that we should limit these things as much as possible, rather than see it as an opportunity to do crazy things. Solve the problem at hand _first_. Solve it as simply as you can. And hope that you never ever will need anything more complex. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-11 22:31 ` Linus Torvalds @ 2009-12-11 23:48 ` Rafael J. Wysocki 2009-12-11 23:53 ` Linus Torvalds 2009-12-12 0:43 ` Alan Stern 0 siblings, 2 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-11 23:48 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Friday 11 December 2009, Linus Torvalds wrote: > > On Fri, 11 Dec 2009, Rafael J. Wysocki wrote: > > > > But fine, say we use the approach based on rwsems and consider suspend and > > the inner lock. We acquire it using down_write(), because we want to wait for > > multiple other dirvers. Now, in fact we could do literally > > > > down_write(dev->power.rwsem); > > up_write(dev->power.rwsem); > > > > because the lock doesn't really protect anything from anyone. What it does is > > to prevent _us_ from doing something too early. To me, personally, it's not a > > usual use of locks. > > I agree that it's fairly unusual, but on the other hand, it's unusual only > because you contrieved it to be. Whatever. The very fact that you can freely move the up_write() (as long as it's after the down_write()) is fairly unusual. > But think about it this way: if you could abort a failed suspend, and > start resuming devices immediately, without doing that > "async_synchronize_full()" in between - simply because you know that the > node locking itself will just "do the right thing". I'd rather not. :-) > To me, that's a sign of a _good_ design. Using a rwsem is simply just more > robust and natural for the problem in question. Exactly because it's a > real lock. ... > Solve the problem at hand _first_. Solve it as simply as you can. And hope > that you never ever will need anything more complex. Below is a patch I've just tested, but there's a lockdep problem in it I don't know how to solve. Namely, lockdep is apparently unhappy with us not releasing the lock taken in device_suspend() and it complains we take it twice in a row (which we do, but for another device). I need to use down_read_non_owner() to make it shut up and then I also need to use up_read_non_owner() in __device_suspend(), although there's the comment in include/linux/rwsem.h saying exatly this about that: /* * Take/release a lock when not the owner will release it. * * [ This API should be avoided as much as possible - the * proper abstraction for this case is completions. ] */ (I'd like to know your opinion about that). Yet, that's not all, because next it complains during resume that __device_resume() releases a lock it didn't acquire, which it clearly does, but that is intentional. Unfortunately, there's no up_write_non_owner() ... So, what am I supposed to do about that? Rafael --- drivers/base/power/main.c | 107 +++++++++++++++++++++++++++++++++++++++---- include/linux/device.h | 6 ++ include/linux/pm.h | 3 + include/linux/resume-trace.h | 7 ++ 4 files changed, 114 insertions(+), 9 deletions(-) Index: linux-2.6/include/linux/pm.h =================================================================== --- linux-2.6.orig/include/linux/pm.h +++ linux-2.6/include/linux/pm.h @@ -26,6 +26,7 @@ #include <linux/spinlock.h> #include <linux/wait.h> #include <linux/timer.h> +#include <linux/rwsem.h> /* * Callbacks for platform drivers to implement. @@ -412,9 +413,11 @@ struct dev_pm_info { pm_message_t power_state; unsigned int can_wakeup:1; unsigned int should_wakeup:1; + unsigned async_suspend:1; enum dpm_state status; /* Owned by the PM core */ #ifdef CONFIG_PM_SLEEP struct list_head entry; + struct rw_semaphore rwsem; #endif #ifdef CONFIG_PM_RUNTIME struct timer_list suspend_timer; Index: linux-2.6/drivers/base/power/main.c =================================================================== --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -25,6 +25,7 @@ #include <linux/resume-trace.h> #include <linux/rwsem.h> #include <linux/interrupt.h> +#include <linux/async.h> #include "../base.h" #include "power.h" @@ -42,6 +43,7 @@ LIST_HEAD(dpm_list); static DEFINE_MUTEX(dpm_list_mtx); +static pm_message_t pm_transition; /* * Set once the preparation of devices for a PM transition has started, reset @@ -56,6 +58,7 @@ static bool transition_started; void device_pm_init(struct device *dev) { dev->power.status = DPM_ON; + init_rwsem(&dev->power.rwsem); pm_runtime_init(dev); } @@ -381,17 +384,22 @@ void dpm_resume_noirq(pm_message_t state EXPORT_SYMBOL_GPL(dpm_resume_noirq); /** - * device_resume - Execute "resume" callbacks for given device. + * __device_resume - Execute "resume" callbacks for given device. * @dev: Device to handle. * @state: PM transition of the system being carried out. */ -static int device_resume(struct device *dev, pm_message_t state) +static int __device_resume(struct device *dev, pm_message_t state) { + struct device *parent = dev->parent; int error = 0; TRACE_DEVICE(dev); TRACE_RESUME(0); + /* Wait for the parent's resume to complete, if necessary. */ + if (parent) + down_read_nested(&parent->power.rwsem, SINGLE_DEPTH_NESTING); + down(&dev->sem); if (dev->bus) { @@ -426,11 +434,41 @@ static int device_resume(struct device * } End: up(&dev->sem); + if (parent) + up_read(&parent->power.rwsem); + + /* Allow the children to resume now. */ + up_write(&dev->power.rwsem); TRACE_RESUME(error); return error; } +static void async_resume(void *data, async_cookie_t cookie) +{ + struct device *dev = (struct device *)data; + int error; + + error = __device_resume(dev, pm_transition); + if (error) + pm_dev_err(dev, pm_transition, " async", error); + put_device(dev); +} + +static int device_resume(struct device *dev) +{ + /* Prevent the children from resuming before us. */ + down_write(&dev->power.rwsem); + + if (dev->power.async_suspend && !pm_trace_is_enabled()) { + get_device(dev); + async_schedule(async_resume, dev); + return 0; + } + + return __device_resume(dev, pm_transition); +} + /** * dpm_resume - Execute "resume" callbacks for non-sysdev devices. * @state: PM transition of the system being carried out. @@ -444,6 +482,7 @@ static void dpm_resume(pm_message_t stat INIT_LIST_HEAD(&list); mutex_lock(&dpm_list_mtx); + pm_transition = state; while (!list_empty(&dpm_list)) { struct device *dev = to_device(dpm_list.next); @@ -454,7 +493,7 @@ static void dpm_resume(pm_message_t stat dev->power.status = DPM_RESUMING; mutex_unlock(&dpm_list_mtx); - error = device_resume(dev, state); + error = device_resume(dev); mutex_lock(&dpm_list_mtx); if (error) @@ -469,6 +508,7 @@ static void dpm_resume(pm_message_t stat } list_splice(&list, &dpm_list); mutex_unlock(&dpm_list_mtx); + async_synchronize_full(); } /** @@ -584,13 +624,11 @@ static int device_suspend_noirq(struct d { int error = 0; - if (!dev->bus) - return 0; - - if (dev->bus->pm) { + if (dev->bus && dev->bus->pm) { pm_dev_dbg(dev, state, "LATE "); error = pm_noirq_op(dev, dev->bus->pm, state); } + return error; } @@ -623,17 +661,24 @@ int dpm_suspend_noirq(pm_message_t state } EXPORT_SYMBOL_GPL(dpm_suspend_noirq); +static int async_error; + /** * device_suspend - Execute "suspend" callbacks for given device. * @dev: Device to handle. * @state: PM transition of the system being carried out. */ -static int device_suspend(struct device *dev, pm_message_t state) +static int __device_suspend(struct device *dev, pm_message_t state) { int error = 0; + /* Wait for the suspends of the children to complete, if necessary. */ + down_write_nested(&dev->power.rwsem, SINGLE_DEPTH_NESTING); down(&dev->sem); + if (async_error) + goto End; + if (dev->class) { if (dev->class->pm) { pm_dev_dbg(dev, state, "class "); @@ -666,12 +711,50 @@ static int device_suspend(struct device suspend_report_result(dev->bus->suspend, error); } } + + if (!error) + dev->power.status = DPM_OFF; + End: up(&dev->sem); + up_write(&dev->power.rwsem); + + /* Allow the parent to suspend now. */ + if (dev->parent) + up_read_non_owner(&dev->parent->power.rwsem); return error; } +static void async_suspend(void *data, async_cookie_t cookie) +{ + struct device *dev = (struct device *)data; + int error; + + error = __device_suspend(dev, pm_transition); + if (error) { + pm_dev_err(dev, pm_transition, " async", error); + async_error = error; + } + + put_device(dev); +} + +static int device_suspend(struct device *dev, pm_message_t state) +{ + /* Prevent the parent from suspending before us. */ + if (dev->parent) + down_read_non_owner(&dev->parent->power.rwsem); + + if (dev->power.async_suspend) { + get_device(dev); + async_schedule(async_suspend, dev); + return 0; + } + + return __device_suspend(dev, pm_transition); +} + /** * dpm_suspend - Execute "suspend" callbacks for all non-sysdev devices. * @state: PM transition of the system being carried out. @@ -683,6 +766,7 @@ static int dpm_suspend(pm_message_t stat INIT_LIST_HEAD(&list); mutex_lock(&dpm_list_mtx); + pm_transition = state; while (!list_empty(&dpm_list)) { struct device *dev = to_device(dpm_list.prev); @@ -697,13 +781,17 @@ static int dpm_suspend(pm_message_t stat put_device(dev); break; } - dev->power.status = DPM_OFF; if (!list_empty(&dev->power.entry)) list_move(&dev->power.entry, &list); put_device(dev); + if (async_error) + break; } list_splice(&list, dpm_list.prev); mutex_unlock(&dpm_list_mtx); + async_synchronize_full(); + if (!error) + error = async_error; return error; } @@ -762,6 +850,7 @@ static int dpm_prepare(pm_message_t stat INIT_LIST_HEAD(&list); mutex_lock(&dpm_list_mtx); transition_started = true; + async_error = 0; while (!list_empty(&dpm_list)) { struct device *dev = to_device(dpm_list.next); Index: linux-2.6/include/linux/resume-trace.h =================================================================== --- linux-2.6.orig/include/linux/resume-trace.h +++ linux-2.6/include/linux/resume-trace.h @@ -6,6 +6,11 @@ extern int pm_trace_enabled; +static inline int pm_trace_is_enabled(void) +{ + return pm_trace_enabled; +} + struct device; extern void set_trace_device(struct device *); extern void generate_resume_trace(const void *tracedata, unsigned int user); @@ -17,6 +22,8 @@ extern void generate_resume_trace(const #else +static inline int pm_trace_is_enabled(void) { return 0; } + #define TRACE_DEVICE(dev) do { } while (0) #define TRACE_RESUME(dev) do { } while (0) Index: linux-2.6/include/linux/device.h =================================================================== --- linux-2.6.orig/include/linux/device.h +++ linux-2.6/include/linux/device.h @@ -472,6 +472,12 @@ static inline int device_is_registered(s return dev->kobj.state_in_sysfs; } +static inline void device_enable_async_suspend(struct device *dev, bool enable) +{ + if (dev->power.status == DPM_ON) + dev->power.async_suspend = enable; +} + void driver_init(void); /* ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-11 23:48 ` Rafael J. Wysocki @ 2009-12-11 23:53 ` Linus Torvalds 2009-12-12 17:48 ` Rafael J. Wysocki 2009-12-12 0:43 ` Alan Stern 1 sibling, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-11 23:53 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Sat, 12 Dec 2009, Rafael J. Wysocki wrote: > > Below is a patch I've just tested, but there's a lockdep problem in it I don't > know how to solve. Namely, lockdep is apparently unhappy with us not releasing > the lock taken in device_suspend() and it complains we take it twice in a row > (which we do, but for another device). I need to use down_read_non_owner() > to make it shut up and then I also need to use up_read_non_owner() in > __device_suspend(), Ok, that I admit is actually a problem. Ok, ok, I'll accept that completion() version, even though I think it's inferior. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-11 23:53 ` Linus Torvalds @ 2009-12-12 17:48 ` Rafael J. Wysocki 2009-12-12 18:54 ` Linus Torvalds 0 siblings, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-12 17:48 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Saturday 12 December 2009, Linus Torvalds wrote: > > On Sat, 12 Dec 2009, Rafael J. Wysocki wrote: > > > > Below is a patch I've just tested, but there's a lockdep problem in it I don't > > know how to solve. Namely, lockdep is apparently unhappy with us not releasing > > the lock taken in device_suspend() and it complains we take it twice in a row > > (which we do, but for another device). I need to use down_read_non_owner() > > to make it shut up and then I also need to use up_read_non_owner() in > > __device_suspend(), > > Ok, that I admit is actually a problem. > > Ok, ok, I'll accept that completion() version, even though I think it's > inferior. Great! :-) I slightly changed it in the meantime to avoid calling wait_for_completion() when both the parent and the child are "synchronous", which prevents the code from choking on some situations when the ordering of dpm_list is wrong (this happens as a result of bugs, but not necessarily fatal, for example if one of the drivers' suspend and resume callbacks are NULL and the bus type doesn't access the hardware directly, so we shouldn't make things worse than they already are IMO). I'd like to put it into my tree in this form, if you don't mind. [Note for Alan: dpm_wait() is not exported for now, we'll export it when there are any users.] Rafael --- From: Rafael J. Wysocki <rjw@sisk.pl> Subject: PM: Asynchronous suspend and resume of devices Theoretically, the total time of system sleep transitions (suspend to RAM, hibernation) can be reduced by running suspend and resume callbacks of device drivers in parallel with each other. However, there are dependencies between devices such that we're not allowed to suspend the parent of a device before suspending the device itself. Analogously, we're not allowed to resume a device before resuming its parent. Thus, to make it possible to execute device drivers' suspend and resume callbacks in parallel with each other, introduce (at the PM core level) a synchronization mechanism preventing the dependencies between devices from being violated. First, device drivers that want their suspend and resume callbacks to be run asynchronously need to set the power.async_suspend flags of their devices using device_enable_async_suspend(). Second, for each device with the power.async_suspend flag set the PM core will start async threads to execute its suspend and resume callbacks. The async threads started for different devices are synchronized with each other and with the main suspend (or resume) thread with the help of completions, in the following way: (1) There is a completion, power.completion, for each device object. (2) Each device's completion is reset before starting the async suspend (or resume) thread for the device or, in the case of devices whose power.async_suspend flags are not set, before executing the device's suspend and resume callbacks. (3) During suspend, right before running the bus type, device type and device class suspend callbacks for the device, the PM core waits for the completions of all the device's children to be completed. (4) During resume, right before running the bus type, device type and device class resume callbacks for the device, the PM core waits for the completion of the device's parent to be completed. (5) The PM core completes power.completion for each device right after the bus type, device type and device class suspend (or resume) callbacks executed for the device have returned. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> --- drivers/base/power/main.c | 115 ++++++++++++++++++++++++++++++++++++++++--- include/linux/device.h | 6 ++ include/linux/pm.h | 3 + include/linux/resume-trace.h | 7 ++ 4 files changed, 125 insertions(+), 6 deletions(-) Index: linux-2.6/include/linux/pm.h =================================================================== --- linux-2.6.orig/include/linux/pm.h +++ linux-2.6/include/linux/pm.h @@ -26,6 +26,7 @@ #include <linux/spinlock.h> #include <linux/wait.h> #include <linux/timer.h> +#include <linux/completion.h> /* * Callbacks for platform drivers to implement. @@ -412,9 +413,11 @@ struct dev_pm_info { pm_message_t power_state; unsigned int can_wakeup:1; unsigned int should_wakeup:1; + unsigned async_suspend:1; enum dpm_state status; /* Owned by the PM core */ #ifdef CONFIG_PM_SLEEP struct list_head entry; + struct completion completion; #endif #ifdef CONFIG_PM_RUNTIME struct timer_list suspend_timer; Index: linux-2.6/drivers/base/power/main.c =================================================================== --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -25,6 +25,7 @@ #include <linux/resume-trace.h> #include <linux/rwsem.h> #include <linux/interrupt.h> +#include <linux/async.h> #include "../base.h" #include "power.h" @@ -42,6 +43,7 @@ LIST_HEAD(dpm_list); static DEFINE_MUTEX(dpm_list_mtx); +static pm_message_t pm_transition; /* * Set once the preparation of devices for a PM transition has started, reset @@ -56,6 +58,7 @@ static bool transition_started; void device_pm_init(struct device *dev) { dev->power.status = DPM_ON; + init_completion(&dev->power.completion); pm_runtime_init(dev); } @@ -111,6 +114,7 @@ void device_pm_remove(struct device *dev pr_debug("PM: Removing info for %s:%s\n", dev->bus ? dev->bus->name : "No Bus", kobject_name(&dev->kobj)); + complete_all(&dev->power.completion); mutex_lock(&dpm_list_mtx); list_del_init(&dev->power.entry); mutex_unlock(&dpm_list_mtx); @@ -162,6 +166,31 @@ void device_pm_move_last(struct device * } /** + * dpm_wait - Wait for a PM operation to complete. + * @dev: Device to wait for. + * @async: If unset, wait only if the device's power.async_suspend flag is set. + */ +static void dpm_wait(struct device *dev, bool async) +{ + if (!dev) + return; + + if (async || dev->power.async_suspend) + wait_for_completion(&dev->power.completion); +} + +static int dpm_wait_fn(struct device *dev, void *async_ptr) +{ + dpm_wait(dev, *((bool *)async_ptr)); + return 0; +} + +static void dpm_wait_for_children(struct device *dev, bool async) +{ + device_for_each_child(dev, &async, dpm_wait_fn); +} + +/** * pm_op - Execute the PM operation appropriate for given PM event. * @dev: Device to handle. * @ops: PM operations to choose from. @@ -381,17 +410,19 @@ void dpm_resume_noirq(pm_message_t state EXPORT_SYMBOL_GPL(dpm_resume_noirq); /** - * device_resume - Execute "resume" callbacks for given device. + * __device_resume - Execute "resume" callbacks for given device. * @dev: Device to handle. * @state: PM transition of the system being carried out. + * @async: If true, the device is being resumed asynchronously. */ -static int device_resume(struct device *dev, pm_message_t state) +static int __device_resume(struct device *dev, pm_message_t state, bool async) { int error = 0; TRACE_DEVICE(dev); TRACE_RESUME(0); + dpm_wait(dev->parent, async); down(&dev->sem); if (dev->bus) { @@ -426,11 +457,36 @@ static int device_resume(struct device * } End: up(&dev->sem); + complete_all(&dev->power.completion); TRACE_RESUME(error); return error; } +static void async_resume(void *data, async_cookie_t cookie) +{ + struct device *dev = (struct device *)data; + int error; + + error = __device_resume(dev, pm_transition, true); + if (error) + pm_dev_err(dev, pm_transition, " async", error); + put_device(dev); +} + +static int device_resume(struct device *dev) +{ + INIT_COMPLETION(dev->power.completion); + + if (dev->power.async_suspend && !pm_trace_is_enabled()) { + get_device(dev); + async_schedule(async_resume, dev); + return 0; + } + + return __device_resume(dev, pm_transition, false); +} + /** * dpm_resume - Execute "resume" callbacks for non-sysdev devices. * @state: PM transition of the system being carried out. @@ -444,6 +500,7 @@ static void dpm_resume(pm_message_t stat INIT_LIST_HEAD(&list); mutex_lock(&dpm_list_mtx); + pm_transition = state; while (!list_empty(&dpm_list)) { struct device *dev = to_device(dpm_list.next); @@ -454,7 +511,7 @@ static void dpm_resume(pm_message_t stat dev->power.status = DPM_RESUMING; mutex_unlock(&dpm_list_mtx); - error = device_resume(dev, state); + error = device_resume(dev); mutex_lock(&dpm_list_mtx); if (error) @@ -469,6 +526,7 @@ static void dpm_resume(pm_message_t stat } list_splice(&list, &dpm_list); mutex_unlock(&dpm_list_mtx); + async_synchronize_full(); } /** @@ -623,17 +681,24 @@ int dpm_suspend_noirq(pm_message_t state } EXPORT_SYMBOL_GPL(dpm_suspend_noirq); +static int async_error; + /** * device_suspend - Execute "suspend" callbacks for given device. * @dev: Device to handle. * @state: PM transition of the system being carried out. + * @async: If true, the device is being suspended asynchronously. */ -static int device_suspend(struct device *dev, pm_message_t state) +static int __device_suspend(struct device *dev, pm_message_t state, bool async) { int error = 0; + dpm_wait_for_children(dev, async); down(&dev->sem); + if (async_error) + goto End; + if (dev->class) { if (dev->class->pm) { pm_dev_dbg(dev, state, "class "); @@ -666,12 +731,44 @@ static int device_suspend(struct device suspend_report_result(dev->bus->suspend, error); } } + + if (!error) + dev->power.status = DPM_OFF; + End: up(&dev->sem); + complete_all(&dev->power.completion); return error; } +static void async_suspend(void *data, async_cookie_t cookie) +{ + struct device *dev = (struct device *)data; + int error; + + error = __device_suspend(dev, pm_transition, true); + if (error) { + pm_dev_err(dev, pm_transition, " async", error); + async_error = error; + } + + put_device(dev); +} + +static int device_suspend(struct device *dev) +{ + INIT_COMPLETION(dev->power.completion); + + if (dev->power.async_suspend) { + get_device(dev); + async_schedule(async_suspend, dev); + return 0; + } + + return __device_suspend(dev, pm_transition, false); +} + /** * dpm_suspend - Execute "suspend" callbacks for all non-sysdev devices. * @state: PM transition of the system being carried out. @@ -683,13 +780,15 @@ static int dpm_suspend(pm_message_t stat INIT_LIST_HEAD(&list); mutex_lock(&dpm_list_mtx); + pm_transition = state; + async_error = 0; while (!list_empty(&dpm_list)) { struct device *dev = to_device(dpm_list.prev); get_device(dev); mutex_unlock(&dpm_list_mtx); - error = device_suspend(dev, state); + error = device_suspend(dev); mutex_lock(&dpm_list_mtx); if (error) { @@ -697,13 +796,17 @@ static int dpm_suspend(pm_message_t stat put_device(dev); break; } - dev->power.status = DPM_OFF; if (!list_empty(&dev->power.entry)) list_move(&dev->power.entry, &list); put_device(dev); + if (async_error) + break; } list_splice(&list, dpm_list.prev); mutex_unlock(&dpm_list_mtx); + async_synchronize_full(); + if (!error) + error = async_error; return error; } Index: linux-2.6/include/linux/resume-trace.h =================================================================== --- linux-2.6.orig/include/linux/resume-trace.h +++ linux-2.6/include/linux/resume-trace.h @@ -6,6 +6,11 @@ extern int pm_trace_enabled; +static inline int pm_trace_is_enabled(void) +{ + return pm_trace_enabled; +} + struct device; extern void set_trace_device(struct device *); extern void generate_resume_trace(const void *tracedata, unsigned int user); @@ -17,6 +22,8 @@ extern void generate_resume_trace(const #else +static inline int pm_trace_is_enabled(void) { return 0; } + #define TRACE_DEVICE(dev) do { } while (0) #define TRACE_RESUME(dev) do { } while (0) Index: linux-2.6/include/linux/device.h =================================================================== --- linux-2.6.orig/include/linux/device.h +++ linux-2.6/include/linux/device.h @@ -472,6 +472,12 @@ static inline int device_is_registered(s return dev->kobj.state_in_sysfs; } +static inline void device_enable_async_suspend(struct device *dev, bool enable) +{ + if (dev->power.status == DPM_ON) + dev->power.async_suspend = enable; +} + void driver_init(void); /* ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-12 17:48 ` Rafael J. Wysocki @ 2009-12-12 18:54 ` Linus Torvalds 2009-12-12 22:34 ` Rafael J. Wysocki ` (2 more replies) 0 siblings, 3 replies; 235+ messages in thread From: Linus Torvalds @ 2009-12-12 18:54 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Sat, 12 Dec 2009, Rafael J. Wysocki wrote: > > I'd like to put it into my tree in this form, if you don't mind. This version still has a major problem, which is not related to completions vs rwsems, but simply to the fact that you wanted to do this at the generic device layer level rather than do it at the actual low-level suspend/resume level. Namely that there's no apparent sane way to say "don't wait for children". PCI bridges that don't suspend at all - or any other device that only suspends in the 'suspend_late()' thing, for that matter - don't have any reason what-so-ever to wait for children, since they aren't actually suspending in the first place. But you make them wait regardless, which then serializes things unnecessarily (for example, two unrelated USB controllers). And no, making _everything_ be async is _not_ the answer. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-12 18:54 ` Linus Torvalds @ 2009-12-12 22:34 ` Rafael J. Wysocki 2009-12-12 22:40 ` Rafael J. Wysocki 2009-12-14 18:21 ` Linus Torvalds 2009-12-13 13:08 ` Rafael J. Wysocki 2009-12-13 17:30 ` Alan Stern 2 siblings, 2 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-12 22:34 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Saturday 12 December 2009, Linus Torvalds wrote: > > On Sat, 12 Dec 2009, Rafael J. Wysocki wrote: > > > > I'd like to put it into my tree in this form, if you don't mind. > > This version still has a major problem, which is not related to > completions vs rwsems, but simply to the fact that you wanted to do this > at the generic device layer level rather than do it at the actual > low-level suspend/resume level. > > Namely that there's no apparent sane way to say "don't wait for children". > > PCI bridges that don't suspend at all - or any other device that only > suspends in the 'suspend_late()' thing, for that matter - don't have any > reason what-so-ever to wait for children, since they aren't actually > suspending in the first place. But you make them wait regardless, which > then serializes things unnecessarily (for example, two unrelated USB > controllers). This is a problem that needs to be solved. One solution that we have discussed on linux-pm is to start a bunch of async threads searching for async devices that can be suspended and suspending them (assuming suspend is considered) out of order with respect to dpm_list. For example, leaf async devices can always be suspended at the same time regardless of their positions in dpm_list. This way we could get almost the entire gain resulting from suspending or resuming devices in parallel without bothering drivers with the problem of dependencies that need to be honoured. That's something we can add on top of this patch, though, not to complicate things from the start and it surely requires more discussion. > And no, making _everything_ be async is _not_ the answer. I'm not sure what you mean, really. Speaking of PCI bridges, even though they don't "suspend" in the sense of being put into low power states or something, we still need to save their registers on suspend and restore them on resume, and that restore has to be done before we start to access devices below the bridge. There are devices with totally null suspend and resume routines that even the bus type doesn't really handle, but those can be marked as "async" from the start and they won't really get in the way any more (this creates another issue to solve, namely that we shouldn't really start a new async thread for each of them; we have considered that too). Even if we move that all to drivers, the constraints won't go away and someone will have to take care of them. Now, since _we_ have problems with reaching an agreement about how to do it, the driver writers will be even less likely to figure that out. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-12 22:34 ` Rafael J. Wysocki @ 2009-12-12 22:40 ` Rafael J. Wysocki 2009-12-14 18:21 ` Linus Torvalds 1 sibling, 0 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-12 22:40 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Saturday 12 December 2009, Rafael J. Wysocki wrote: > On Saturday 12 December 2009, Linus Torvalds wrote: > > > > On Sat, 12 Dec 2009, Rafael J. Wysocki wrote: > > > ... > > > And no, making _everything_ be async is _not_ the answer. > > I'm not sure what you mean, really. > > Speaking of PCI bridges, even though they don't "suspend" in the sense of > being put into low power states or something, we still need to save their > registers on suspend and restore them on resume, and that restore has to > be done before we start to access devices below the bridge. Of course we restore them at the early stage now so the above remark does't apply to the patch in question, sorry. But the one below does. > Even if we move that all to drivers, the constraints won't go away and someone > will have to take care of them. Now, since _we_ have problems with reaching > an agreement about how to do it, the driver writers will be even less likely to > figure that out. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-12 22:34 ` Rafael J. Wysocki 2009-12-12 22:40 ` Rafael J. Wysocki @ 2009-12-14 18:21 ` Linus Torvalds 2009-12-14 22:11 ` Rafael J. Wysocki 1 sibling, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-14 18:21 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Sat, 12 Dec 2009, Rafael J. Wysocki wrote: > > One solution that we have discussed on linux-pm is to start a bunch of async > threads searching for async devices that can be suspended and suspending > them (assuming suspend is considered) out of order with respect to dpm_list. Ok, guys, stop the crazy. That's another of those "ok, that's just ttoally stupid and clearly too complex" ideas that I would never pull. I should seriously suggest that people just stop discussing architectural details on the pm list if they all end up being this level of crazy. The sane thing to do is to just totally ignore the async layer on PCI bridges and other things that only have a late-suspend/early-resume thing. No need for the above kind of obviously idiotic crap. However, my point was really that we wouldn't even have _needed_ that kind of special case if we had just decided to let the subsystems do it. But whatever. At worst, the PCI layer can even just mark such devices with just late/early suspend/resume as being asynchronous, even though that ends up resulting in some totally pointless async work that doesn't do anything. But please guys - reign in the crazy ideas on the pm list. It's not like our suspend/resume has gotten so stable as to be boring, and we want it to become unreliable again. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-14 18:21 ` Linus Torvalds @ 2009-12-14 22:11 ` Rafael J. Wysocki 2009-12-14 22:41 ` Linus Torvalds 0 siblings, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-14 22:11 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Monday 14 December 2009, Linus Torvalds wrote: > > On Sat, 12 Dec 2009, Rafael J. Wysocki wrote: > > > > One solution that we have discussed on linux-pm is to start a bunch of async > > threads searching for async devices that can be suspended and suspending > > them (assuming suspend is considered) out of order with respect to dpm_list. > > Ok, guys, stop the crazy. > > That's another of those "ok, that's just ttoally stupid and clearly too > complex" ideas that I would never pull. > > I should seriously suggest that people just stop discussing architectural > details on the pm list if they all end up being this level of crazy. > > The sane thing to do is to just totally ignore the async layer on PCI > bridges and other things that only have a late-suspend/early-resume thing. > No need for the above kind of obviously idiotic crap. > > However, my point was really that we wouldn't even have _needed_ that kind > of special case if we had just decided to let the subsystems do it. But > whatever. At worst, the PCI layer can even just mark such devices with > just late/early suspend/resume as being asynchronous, even though that > ends up resulting in some totally pointless async work that doesn't do > anything. > > But please guys - reign in the crazy ideas on the pm list. It's not like > our suspend/resume has gotten so stable as to be boring, and we want it to > become unreliable again. Indeed. OK, what about a two-pass approach in which the first pass only inits the completions and starts async threads for leaf "async" devices? I think leaf devices are most likely to take much time to suspend, so this will give us a chance to save quite some time. A more aggressive version of this might start the async threads for all async devices in the first pass and then only handle the sychronous ones in the second pass - as long as there are only a few async devices that should be quite efficient. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-14 22:11 ` Rafael J. Wysocki @ 2009-12-14 22:41 ` Linus Torvalds 2009-12-14 22:43 ` Linus Torvalds 2009-12-14 23:18 ` Rafael J. Wysocki 0 siblings, 2 replies; 235+ messages in thread From: Linus Torvalds @ 2009-12-14 22:41 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Mon, 14 Dec 2009, Rafael J. Wysocki wrote: > > OK, what about a two-pass approach in which the first pass only inits the > completions and starts async threads for leaf "async" devices? I think leaf > devices are most likely to take much time to suspend, so this will give us > a chance to save quite some time. Why? Really. Again, stop making it harder than it needs to be. Why do you make up these crazy schemes that are way more complex than they need to be? Here's an untested one-liner that has a 10-line comment. I agree it is ugly, but it is ugly exactly because the generic device layer _forces_ us to wait for children even when we don't want to. With this, that unnecessary wait is now done asynchronously. I'd rather do it some other way - perhaps having an explicit flag that says "don't wait for children because I'm not going to suspend myself until 'suspend_late' _anyway_". But at least this is _simple_. Linus --- drivers/pci/probe.c | 11 +++++++++++ 1 files changed, 11 insertions(+), 0 deletions(-) diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 98ffb2d..4e0ad7b 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -437,6 +437,17 @@ static struct pci_bus *pci_alloc_child_bus(struct pci_bus *parent, } bridge->subordinate = child; + /* + * We don't really suspend PCI buses asyncronously. + * + * However, since we don't actually suspend them at all until + * the late phase, we might as well lie to the device layer + * and it to do our no-op not-suspend asynchronously, so that + * we end up not synchronizing with any of our child devices + * that might want to be asynchronous. + */ + bridge->dev.power.async_suspend = 1; + return child; } ^ permalink raw reply related [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-14 22:41 ` Linus Torvalds @ 2009-12-14 22:43 ` Linus Torvalds 2009-12-14 23:18 ` Rafael J. Wysocki 1 sibling, 0 replies; 235+ messages in thread From: Linus Torvalds @ 2009-12-14 22:43 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Mon, 14 Dec 2009, Linus Torvalds wrote: > > Here's an untested one-liner that has a 10-line comment. Btw, when I say "untested", in this case I mean that it isn't even compile-tested. I haven't merged your other patches yet, so in my tree that 'async_suspend' flag doesn't even exist, and the patch I sent out definitely doesn't compile. But it _might_ compile (and perhaps even work) in your tree. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-14 22:41 ` Linus Torvalds 2009-12-14 22:43 ` Linus Torvalds @ 2009-12-14 23:18 ` Rafael J. Wysocki 2009-12-15 0:10 ` Linus Torvalds 1 sibling, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-14 23:18 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Monday 14 December 2009, Linus Torvalds wrote: > > On Mon, 14 Dec 2009, Rafael J. Wysocki wrote: > > > > OK, what about a two-pass approach in which the first pass only inits the > > completions and starts async threads for leaf "async" devices? I think leaf > > devices are most likely to take much time to suspend, so this will give us > > a chance to save quite some time. > > Why? > > Really. Because the PCI bridges are not the only case where it matters (I'd say they are really a corner case). Basically, any two async devices separeted by a series of sync ones are likely not to be suspended (or resumed) in parallel with each other, because the parent is usually next to its children in dpm_list. So, if the first device suspends, its "synchronous" parent waits for it and the suspend of the second async device won't be started until the first one's suspend has returned. And it doesn't matter at what level we do the async thing, because dpm_list is there anyway. As Alan said, the real problem is that we generally can't change the ordering of dpm_list arbitrarily, because we don't know what's going to happen as a result. The async_suspend flag tells us, basically, what devices can be safely moved to different positions in dpm_list without breaking things, as long as they are not moved behind their parents or in front of their children. Starting the async suspends upfront would effectively work in the same way as moving those devices to the beginning of dpm_list without breaking the parent-child chains, which in turn is likely to allow us to save some extra time. That's not only about the PCI bridges, it's more general. As far as your one-liner is concerned, I'm going to test it, because I think we could use it anyway. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-14 23:18 ` Rafael J. Wysocki @ 2009-12-15 0:10 ` Linus Torvalds 2009-12-15 0:11 ` Linus Torvalds 2009-12-15 11:03 ` Rafael J. Wysocki 0 siblings, 2 replies; 235+ messages in thread From: Linus Torvalds @ 2009-12-15 0:10 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 15 Dec 2009, Rafael J. Wysocki wrote: > > Because the PCI bridges are not the only case where it matters (I'd say they > are really a corner case). Basically, any two async devices separeted by a > series of sync ones are likely not to be suspended (or resumed) in parallel > with each other, because the parent is usually next to its children in dpm_list. Give a real example that matters. Really. How hard can it be to understand: KISS. Keep It Simple, Stupid. I get really tired of this whole stupid async discussion, because you're overdesigning it. To a first approximation, THE ONLY THING THAT MATTERS IS USB. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-15 0:10 ` Linus Torvalds @ 2009-12-15 0:11 ` Linus Torvalds 2009-12-15 11:14 ` Rafael J. Wysocki 2009-12-15 11:03 ` Rafael J. Wysocki 1 sibling, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-15 0:11 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Mon, 14 Dec 2009, Linus Torvalds wrote: > > I get really tired of this whole stupid async discussion, because you're > overdesigning it. Btw, this is important. I'm not going to pull even your _current_ async stuff if you can't show that you fundamentally UNDERSTAND this fact. Stop making up idiotic complex interfaces. Look at my one-liner patch, and realize that it gets you 99% there - the 99% that matters. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-15 0:11 ` Linus Torvalds @ 2009-12-15 11:14 ` Rafael J. Wysocki 2009-12-15 15:31 ` Linus Torvalds 0 siblings, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-15 11:14 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tuesday 15 December 2009, Linus Torvalds wrote: > > On Mon, 14 Dec 2009, Linus Torvalds wrote: > > > > I get really tired of this whole stupid async discussion, because you're > > overdesigning it. > > Btw, this is important. I'm not going to pull even your _current_ async > stuff if you can't show that you fundamentally UNDERSTAND this fact. What fact? The only thing that matters is USB? For resume, it is. For suspend, it clearly isn't. > Stop making up idiotic complex interfaces. Look at my one-liner patch, and > realize that it gets you 99% there - the 99% that matters. I said I was going to use it, but I don't think that's going to be sufficient. [BTW, I'm not sure what you want to achieve by insulting me. Either you may want to scare me, but I'm not scared, or you may want to try to make me so disgusted that I'll just give up and back off, but this is not going to happen either.] Insults aside, I'm going to make some measurements to see how much time we can save. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-15 11:14 ` Rafael J. Wysocki @ 2009-12-15 15:31 ` Linus Torvalds 0 siblings, 0 replies; 235+ messages in thread From: Linus Torvalds @ 2009-12-15 15:31 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 15 Dec 2009, Rafael J. Wysocki wrote: > > What fact? The only thing that matters is USB? For resume, it is. For > suspend, it clearly isn't. For suspend, the only other case we've seen has been the keyboard and mouse controller, which has exactly the same "we can special case it with a single 'let's do _this_ device asynchronously'". Again, it may not be pretty, but it sure is simple. Much simpler than talking about some generic infrastructure changes and about doing "let's do leaves of the tree separately" schemes. And that's why I'm _soo_ unhappy with you, and am insulting you. Because you keep on making the same mistake over and over - overdesigning. Overdesigning is a SIN. It's the archetypal example of what I call "bad taste". I get really upset when a subsystem maintainer starts overdesigning things. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-15 0:10 ` Linus Torvalds 2009-12-15 0:11 ` Linus Torvalds @ 2009-12-15 11:03 ` Rafael J. Wysocki 2009-12-15 15:26 ` Linus Torvalds 1 sibling, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-15 11:03 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tuesday 15 December 2009, Linus Torvalds wrote: > > On Tue, 15 Dec 2009, Rafael J. Wysocki wrote: > > > > Because the PCI bridges are not the only case where it matters (I'd say they > > are really a corner case). Basically, any two async devices separeted by a > > series of sync ones are likely not to be suspended (or resumed) in parallel > > with each other, because the parent is usually next to its children in dpm_list. > > Give a real example that matters. I'll try. Let -> denote child-parent relationships and assume dpm_list looks like this: ..., A->B->C, D, E->F->G, ... where A, B, E, F are all async and C, D, G are sync (E, F, G may be USB and A, B, C may be serio input devices and D is a device that just happens to be in dpm_list between them). Say A and C take the majority of the total suspend time and assume we traverse the dpm_list from left to right. Now, during suspend, C waits for B that waits for A and G waits for F that waits for E. Moreover, since C is sync, the PM core won't start the suspend of D until the suspend of C has returned. In turn, since D is sync, the suspend of E won't be started until the suspend of D has returned. So in this situation the gain from the async suspends of A, B, E, F is zero. However, it won't be zero if we start the async suspends of A, B, E, F upfront. I'm not sure if this is sufficiently "real life" for you, but this is how dpm_list looks on one of my test boxes, more or less. > Really. > > How hard can it be to understand: KISS. Keep It Simple, Stupid. > > I get really tired of this whole stupid async discussion, because you're > overdesigning it. > > To a first approximation, THE ONLY THING THAT MATTERS IS USB. If this applies to _resume_ only, then I agree, but the Arjan's data clearly show that serio devices take much more time to suspend than USB. But if we only talk about resume, the PCI bridges don't really matter, because they are resumed before all devices that depend on them, so they don't really need to wait for anyone anyway. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-15 11:03 ` Rafael J. Wysocki @ 2009-12-15 15:26 ` Linus Torvalds 2009-12-15 15:55 ` Alan Stern 2009-12-16 2:11 ` Rafael J. Wysocki 0 siblings, 2 replies; 235+ messages in thread From: Linus Torvalds @ 2009-12-15 15:26 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 15 Dec 2009, Rafael J. Wysocki wrote: > > > > Give a real example that matters. > > I'll try. Let -> denote child-parent relationships and assume dpm_list looks > like this: No. I mean something real - something like - if you run on a non-PC with two USB buses behind non-PCI controllers. - device xyz. > If this applies to _resume_ only, then I agree, but the Arjan's data clearly > show that serio devices take much more time to suspend than USB. I mean in general - something where you actually have hard data that some device really needs anythign more than my one-liner, and really _needs_ some complex infrastructure. Not "let's imagine a case like xyz". > But if we only talk about resume, the PCI bridges don't really matter, > because they are resumed before all devices that depend on them, so they don't > really need to wait for anyone anyway. But that's my _point_. That's the whole point of the one-liner patch. Read the comment above that one-liner. My whole point was that by doing the whole "wait for children" in generic code, you also made devices - such as PCI bridges - have to wait for children, even though they don't need to, and don't want to. So I suggested an admittedly ugly hack to take care of it - rather than some complex infrastructure. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-15 15:26 ` Linus Torvalds @ 2009-12-15 15:55 ` Alan Stern 2009-12-15 16:28 ` Linus Torvalds 2009-12-16 2:11 ` Rafael J. Wysocki 1 sibling, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-15 15:55 UTC (permalink / raw) To: Linus Torvalds Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 15 Dec 2009, Linus Torvalds wrote: > My whole point was that by doing the whole "wait for children" in generic > code, you also made devices - such as PCI bridges - have to wait for > children, even though they don't need to, and don't want to. > > So I suggested an admittedly ugly hack to take care of it - rather than > some complex infrastructure. It doesn't feel like an ugly hack to me. It seems like exactly the Right Thing To Do: Make as many devices as possible use async suspend/resume. The only reason we don't make every device async is because we don't know whether it's safe. In the case of PCI bridges we _do_ know -- because they don't have any work to do outside of late_suspend/early_resume -- and so they _should_ be async. The same goes for devices that don't have suspend or resume methods. There remains a separate question: Should async devices also be forced to wait for their children? I don't see why not. For PCI bridges it won't make any significant difference. As long as the async code doesn't have to do anything, who cares when it runs? Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-15 15:55 ` Alan Stern @ 2009-12-15 16:28 ` Linus Torvalds 2009-12-15 18:57 ` Linus Torvalds 2009-12-15 20:26 ` Alan Stern 0 siblings, 2 replies; 235+ messages in thread From: Linus Torvalds @ 2009-12-15 16:28 UTC (permalink / raw) To: Alan Stern Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 15 Dec 2009, Alan Stern wrote: > > It doesn't feel like an ugly hack to me. It seems like exactly the > Right Thing To Do: Make as many devices as possible use async > suspend/resume. The reason it's a ugly hack is that it's actually not a simple decision to make. The devil is in the details: > The only reason we don't make every device async is because we don't > know whether it's safe. In the case of PCI bridges we _do_ know -- > because they don't have any work to do outside of > late_suspend/early_resume -- and so they _should_ be async. That's the theory, yes. And it was worth the comment to spell out that theory. But.. It's a very subtle theory, and it's not necessarily always 100% true. For example, a cardbus bridge is strictly speaking very much a PCI bridge, but for cardbus bridges we _do_ have a suspend/resume function. And perhaps worse than that, cardbus bridges are one of the canonical examples where two different PCI devices actually share registers. It's quite common that some of the control registers are shared across the two subfunctions of a two-slot cardbus controller (and we generally don't even have full docs for them!) > The same goes for devices that don't have suspend or resume methods. Yes and no. Again, the "async_suspend" flag is done at the generic device layer, but 99% of all suspend/resume methods are _not_ done at that level: they are bus-specific functions, where the bus has a generic suspend-resume function that it exposes to the generic device layer, and that knows about the bus-specific rules. So if you are a PCI device (to take just that example - but it's true of just about all other buses too), and you don't have any suspend or resume methods, it's actually impossible to see that fact from the generic device layer. And even when you know it's PCI, our rules are actually not simple at all. Our rules for PCI devices (and this strictly speaking is true for bridges too) are rather complex: - do we have _any_ legacy PM support (ie the "direct" driver suspend/resume functions in the driver ops, rather than having a "struct dev_pm_ops" pointer)? If so, call "->suspend()" - If not - do we have that "dev_pm_ops" thing? If so, call it. - If not - just disable the device entirely _UNLESS_ you're a PCI bridge. Notice? The way things are set up, if you have no suspend routine, you'll not get suspended, but you will get disabled. So it's _not_ actually safe to asynchronously suspend a PCI device if that device has no driver or no suspend routines - because even in the absense of a driver and suspend routines, we'll still least disable it. And if there is some subtle dependency on that device that isn't obvious (say, it might be used indirectly for some ACPI thing), then that async suspend is the wrong thing to do. Subtle? Hell yes. So the whole thing about "we can do PCI bridges asynchronously because they are obviously no-op" is kind of true - except for the "obviously" part. It's not obvious at all. It's rather subtle. As an example of this kind of subtlety - iirc PCIE bridges used to have suspend and resume bugs when we initially switched over to the "new world" suspend/resume exactly because they actually did things at "suspend" time (rather than suspend_late), and that broke devices behind them (this was not related to async, of course, but the point is that even when you look like a PCI bridge, you might be doing odd things). So just saying "let's do it asynchronously" is _not_ always guaranteed to be the right thing at all. It's _probably_ safe for at least regular PCI bridges. Cardbus bridges? Probably not, but since most modern laptop have just a single slot - and people who have multiple slots seldom use them all - most people will probably never see the problems that it _could_ introduce. And PCIE bridges? Should be safe these days, but it wasn't quite as obvious, because a PCIE bridge actually has a driver unlike a regular plain PCI-PCI bridge. Subtle, subtle. > There remains a separate question: Should async devices also be forced > to wait for their children? I don't see why not. For PCI bridges it > won't make any significant difference. As long as the async code > doesn't have to do anything, who cares when it runs? That's why I just set the "async_resume = 1" thing. But there might actually be reasons why we care. Like the fact that we actually throttle the amount of parallel work we do in async_schedule(). So doing even a "no-op" asynchronously isn't actually a no-op: while it is pending (and those things can be pending for a long time, since they have to wait for those slow devices underneath them), it can cause _other_ async work - that isn't necessarily a no-op at all - to be then done synchronously. Now, admittedly our async throttling limits are high enough that the above kind of detail will probably never ever realy matter (default 256 worker threads etc). But it's an example of how practice is different from theory - in _theory_ it doesn't make any difference if you wait for something asynchronously, but in practice it could make a difference under some circumstances. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-15 16:28 ` Linus Torvalds @ 2009-12-15 18:57 ` Linus Torvalds 2009-12-15 20:26 ` Alan Stern 1 sibling, 0 replies; 235+ messages in thread From: Linus Torvalds @ 2009-12-15 18:57 UTC (permalink / raw) To: Alan Stern Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 15 Dec 2009, Linus Torvalds wrote: > > And even when you know it's PCI, our rules are actually not simple at all. > Our rules for PCI devices (and this strictly speaking is true for bridges > too) are rather complex: > > - do we have _any_ legacy PM support (ie the "direct" driver > suspend/resume functions in the driver ops, rather than having a > "struct dev_pm_ops" pointer)? If so, call "->suspend()" > > - If not - do we have that "dev_pm_ops" thing? If so, call it. > > - If not - just disable the device entirely _UNLESS_ you're a PCI bridge. > > Notice? The way things are set up, if you have no suspend routine, you'll > not get suspended, but you will get disabled. Side note - what I think might be a clean solution for PCI at least is to do something like the following: - move that "disable the device entirely" thing to suspend_late, rather than the earlier suspend phase. Now PCI devices without drivers or PM will not be touched at all in the first suspend phase. - initialize all PCI devices to have 'async_suspend = 1' on discovery - whenever we bind a driver to the PCI device, we'd then look at whether that driver implements suspend/resume callbacks (legacy or new), and clear the async_suspend bit if so. That way we'd have the same old synchronous behavior for all PCI suspend and resume events (unless the driver itself then sets the async_suspend bit at device init time, which it could do, of course), while still always doing async "no-op" events. That would avoid the ugly one-liner that just "knows" that PCI bridges are special and don't do anything at suspend time (even though they aren't really - a PCI bridge _could_ have a driver associated with it that does something that might not be happy being asynchronous). Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-15 16:28 ` Linus Torvalds 2009-12-15 18:57 ` Linus Torvalds @ 2009-12-15 20:26 ` Alan Stern 2009-12-15 21:26 ` Rafael J. Wysocki 2009-12-15 21:54 ` Linus Torvalds 1 sibling, 2 replies; 235+ messages in thread From: Alan Stern @ 2009-12-15 20:26 UTC (permalink / raw) To: Linus Torvalds Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 15 Dec 2009, Linus Torvalds wrote: > It's a very subtle theory, and it's not necessarily always 100% true. For > example, a cardbus bridge is strictly speaking very much a PCI bridge, but > for cardbus bridges we _do_ have a suspend/resume function. > > And perhaps worse than that, cardbus bridges are one of the canonical > examples where two different PCI devices actually share registers. It's > quite common that some of the control registers are shared across the two > subfunctions of a two-slot cardbus controller (and we generally don't even > have full docs for them!) Okay. This obviously implies that if/when cardbus bridges are converted to async suspend/resume, the driver should make sure that the lower-numbered devices wait for their sibling higher-numbered devices to suspend (and vice versa for resume). Awkward though it may be. > > The same goes for devices that don't have suspend or resume methods. > > Yes and no. > > Again, the "async_suspend" flag is done at the generic device layer, but > 99% of all suspend/resume methods are _not_ done at that level: they are > bus-specific functions, where the bus has a generic suspend-resume > function that it exposes to the generic device layer, and that knows about > the bus-specific rules. > > So if you are a PCI device (to take just that example - but it's true of > just about all other buses too), and you don't have any suspend or resume > methods, it's actually impossible to see that fact from the generic device > layer. Sure. That's why the async_suspend flag is set at the bus/driver level. > And even when you know it's PCI, our rules are actually not simple at all. > Our rules for PCI devices (and this strictly speaking is true for bridges > too) are rather complex: > > - do we have _any_ legacy PM support (ie the "direct" driver > suspend/resume functions in the driver ops, rather than having a > "struct dev_pm_ops" pointer)? If so, call "->suspend()" > > - If not - do we have that "dev_pm_ops" thing? If so, call it. > > - If not - just disable the device entirely _UNLESS_ you're a PCI bridge. > > Notice? The way things are set up, if you have no suspend routine, you'll > not get suspended, but you will get disabled. > > So it's _not_ actually safe to asynchronously suspend a PCI device if that > device has no driver or no suspend routines - because even in the absense > of a driver and suspend routines, we'll still least disable it. And if > there is some subtle dependency on that device that isn't obvious (say, it > might be used indirectly for some ACPI thing), then that async suspend is > the wrong thing to do. > > Subtle? Hell yes. I don't disagree. However the subtlety lies mainly in the matter of non-obvious dependencies. (The other stuff is all known to the PCI core.) AFAICS there's otherwise little difference between an async routine that does nothing and one that disables the device -- both operations are very fast. The ACPI relations are definitely something to worry about. It would be a good idea, at an early stage, to add those dependencies explicitly. I don't know enough about them to say more; perhaps Rafael does. As for other non-obvious dependencies... Who knows? Probably the only way to find them is by experimentation. My guess is that they will turn out to be connected mostly with "high-level" devices: system devices, things on the motherboard -- generally speaking, stuff close to the CPU. Relatively few will be associated with devices below the level of a PCI device or equivalent. Ideally we would figure out how to do the slow devices in parallel without interference from fast devices having unknown dependencies. Unfortunately this may not be possible. > So the whole thing about "we can do PCI bridges asynchronously because > they are obviously no-op" is kind of true - except for the "obviously" > part. It's not obvious at all. It's rather subtle. > > As an example of this kind of subtlety - iirc PCIE bridges used to have > suspend and resume bugs when we initially switched over to the "new world" > suspend/resume exactly because they actually did things at "suspend" time > (rather than suspend_late), and that broke devices behind them (this was > not related to async, of course, but the point is that even when you look > like a PCI bridge, you might be doing odd things). > > So just saying "let's do it asynchronously" is _not_ always guaranteed to > be the right thing at all. It's _probably_ safe for at least regular PCI > bridges. Cardbus bridges? Probably not, but since most modern laptop have > just a single slot - and people who have multiple slots seldom use them > all - most people will probably never see the problems that it _could_ > introduce. > > And PCIE bridges? Should be safe these days, but it wasn't quite as > obvious, because a PCIE bridge actually has a driver unlike a regular > plain PCI-PCI bridge. > > Subtle, subtle. Indeed. Perhaps you were too hasty in suggesting that PCI bridges should be async. It would help a lot to see some device lists for typical machines. (If there are such things.) Otherwise we are just blowing gas. > > There remains a separate question: Should async devices also be forced > > to wait for their children? I don't see why not. For PCI bridges it > > won't make any significant difference. As long as the async code > > doesn't have to do anything, who cares when it runs? > > That's why I just set the "async_resume = 1" thing. > > But there might actually be reasons why we care. Like the fact that we > actually throttle the amount of parallel work we do in async_schedule(). > So doing even a "no-op" asynchronously isn't actually a no-op: while it is > pending (and those things can be pending for a long time, since they have > to wait for those slow devices underneath them), it can cause _other_ > async work - that isn't necessarily a no-op at all - to be then done > synchronously. > > Now, admittedly our async throttling limits are high enough that the above > kind of detail will probably never ever realy matter (default 256 worker > threads etc). But it's an example of how practice is different from theory > - in _theory_ it doesn't make any difference if you wait for something > asynchronously, but in practice it could make a difference under some > circumstances. We certainly shouldn't be worried about side effects of async throttling as this stage. KISS works both ways: Don't overdesign, and don't worry about things that might crop up when you expand the design. We have strayed off the point of your original objection: not providing a way for devices to skip waiting for their children. This really is a separate issue from deciding whether or not to go async. For example, your proposed patch makes PCI bridges async but doesn't allow them to avoid waiting for children. IMO that's a good thing. The real issue is "blockage": synchronous devices preventing possible concurrency among async devices. That's what you thought making PCI bridges async would help. In general, blockage arises in suspend when you have an async child with a synchronous parent. The parent has to wait for the child, which might take a long time, thereby delaying other unrelated devices. (This explains why you wanted to make PCI bridges async -- they are the parents of USB controllers.) For resume it's the opposite: an async parent with synchronous children. Thus, while making PCI bridges async might make suspend faster, it probably won't help much with resume speed. You'd have to make the children of USB devices (SCSI hosts, TTYs, and so on) async. Depending on the order of device registration, of course. Apart from all this, there's a glaring hole in the discussion so far. You and Arjan may not have noticed it, but those of us still using rotating media have to put up with disk resume times that are a factor of 100 (!) larger than USB resume times. That's where the greatest gains are to be found. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-15 20:26 ` Alan Stern @ 2009-12-15 21:26 ` Rafael J. Wysocki 2009-12-15 22:01 ` Alan Stern 2009-12-15 21:54 ` Linus Torvalds 1 sibling, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-15 21:26 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tuesday 15 December 2009, Alan Stern wrote: > On Tue, 15 Dec 2009, Linus Torvalds wrote: > > > It's a very subtle theory, and it's not necessarily always 100% true. For > > example, a cardbus bridge is strictly speaking very much a PCI bridge, but > > for cardbus bridges we _do_ have a suspend/resume function. > > > > And perhaps worse than that, cardbus bridges are one of the canonical > > examples where two different PCI devices actually share registers. It's > > quite common that some of the control registers are shared across the two > > subfunctions of a two-slot cardbus controller (and we generally don't even > > have full docs for them!) > > Okay. This obviously implies that if/when cardbus bridges are > converted to async suspend/resume, the driver should make sure that the > lower-numbered devices wait for their sibling higher-numbered devices > to suspend (and vice versa for resume). Awkward though it may be. > > > > The same goes for devices that don't have suspend or resume methods. > > > > Yes and no. > > > > Again, the "async_suspend" flag is done at the generic device layer, but > > 99% of all suspend/resume methods are _not_ done at that level: they are > > bus-specific functions, where the bus has a generic suspend-resume > > function that it exposes to the generic device layer, and that knows about > > the bus-specific rules. > > > > So if you are a PCI device (to take just that example - but it's true of > > just about all other buses too), and you don't have any suspend or resume > > methods, it's actually impossible to see that fact from the generic device > > layer. > > Sure. That's why the async_suspend flag is set at the bus/driver > level. > > > And even when you know it's PCI, our rules are actually not simple at all. > > Our rules for PCI devices (and this strictly speaking is true for bridges > > too) are rather complex: > > > > - do we have _any_ legacy PM support (ie the "direct" driver > > suspend/resume functions in the driver ops, rather than having a > > "struct dev_pm_ops" pointer)? If so, call "->suspend()" > > > > - If not - do we have that "dev_pm_ops" thing? If so, call it. > > > > - If not - just disable the device entirely _UNLESS_ you're a PCI bridge. > > > > Notice? The way things are set up, if you have no suspend routine, you'll > > not get suspended, but you will get disabled. > > > > So it's _not_ actually safe to asynchronously suspend a PCI device if that > > device has no driver or no suspend routines - because even in the absense > > of a driver and suspend routines, we'll still least disable it. And if > > there is some subtle dependency on that device that isn't obvious (say, it > > might be used indirectly for some ACPI thing), then that async suspend is > > the wrong thing to do. > > > > Subtle? Hell yes. > > I don't disagree. However the subtlety lies mainly in the matter of > non-obvious dependencies. (The other stuff is all known to the PCI > core.) AFAICS there's otherwise little difference between an async > routine that does nothing and one that disables the device -- both > operations are very fast. > > The ACPI relations are definitely something to worry about. It would > be a good idea, at an early stage, to add those dependencies > explicitly. I don't know enough about them to say more; perhaps Rafael > does. It boils down to the fact that for each PCI device known to the ACPI BIOS there is a "shadow" ACPI device that generally has its own suspend/resume callbacks and these "shadow" devices are members of the ACPI subtree of the device tree (ie. they have parents and so on). Now, when I worked on the first version of async suspend/resume, I noticed that if those "shadow" ACPI devices did not wait for their PCI counterparts to suspend, things broke badly. The reason probably wasn't related to what they did in their suspend/resume callbacks, because they are usually empty, but it was rather related to the dependencies between devices in the ACPI subtree (so, generally speaking, it seems the entire ACPI subtree of the device tree should be suspended after the entire PCI subtree). That obviously requires more investigation, though. > As for other non-obvious dependencies... Who knows? Probably the only > way to find them is by experimentation. My guess is that they will > turn out to be connected mostly with "high-level" devices: system > devices, things on the motherboard -- generally speaking, stuff close > to the CPU. Relatively few will be associated with devices below the > level of a PCI device or equivalent. > > Ideally we would figure out how to do the slow devices in parallel > without interference from fast devices having unknown dependencies. > Unfortunately this may not be possible. I really expect to see those "unknown dependencies" in the _noirq suspend/resume phases and above. [The very fact they exist is worrisome, because that's why we don't know why things work on one system and don't work on another, although they appear to be very similar.] > > So the whole thing about "we can do PCI bridges asynchronously because > > they are obviously no-op" is kind of true - except for the "obviously" > > part. It's not obvious at all. It's rather subtle. > > > > As an example of this kind of subtlety - iirc PCIE bridges used to have > > suspend and resume bugs when we initially switched over to the "new world" > > suspend/resume exactly because they actually did things at "suspend" time > > (rather than suspend_late), and that broke devices behind them (this was > > not related to async, of course, but the point is that even when you look > > like a PCI bridge, you might be doing odd things). Well, those "pcieport devices" still are the children of PCIe ports, although physically they just correspond to different sets of registers within the ports' config spaces (_that_ is overdesigned IMnsHO) and they are "suspended" during the regular suspend of their PCIe port "parents". > > So just saying "let's do it asynchronously" is _not_ always guaranteed to > > be the right thing at all. It's _probably_ safe for at least regular PCI > > bridges. Cardbus bridges? Probably not, but since most modern laptop have > > just a single slot - and people who have multiple slots seldom use them > > all - most people will probably never see the problems that it _could_ > > introduce. > > > > And PCIE bridges? Should be safe these days, but it wasn't quite as > > obvious, because a PCIE bridge actually has a driver unlike a regular > > plain PCI-PCI bridge. > > > > Subtle, subtle. > > Indeed. Perhaps you were too hasty in suggesting that PCI bridges > should be async. > > It would help a lot to see some device lists for typical machines. (If > there are such things.) Otherwise we are just blowing gas. > > > > There remains a separate question: Should async devices also be forced > > > to wait for their children? I don't see why not. For PCI bridges it > > > won't make any significant difference. As long as the async code > > > doesn't have to do anything, who cares when it runs? > > > > That's why I just set the "async_resume = 1" thing. > > > > But there might actually be reasons why we care. Like the fact that we > > actually throttle the amount of parallel work we do in async_schedule(). > > So doing even a "no-op" asynchronously isn't actually a no-op: while it is > > pending (and those things can be pending for a long time, since they have > > to wait for those slow devices underneath them), it can cause _other_ > > async work - that isn't necessarily a no-op at all - to be then done > > synchronously. > > > > Now, admittedly our async throttling limits are high enough that the above > > kind of detail will probably never ever realy matter (default 256 worker > > threads etc). But it's an example of how practice is different from theory > > - in _theory_ it doesn't make any difference if you wait for something > > asynchronously, but in practice it could make a difference under some > > circumstances. > > We certainly shouldn't be worried about side effects of async > throttling as this stage. KISS works both ways: Don't overdesign, and > don't worry about things that might crop up when you expand the design. > > We have strayed off the point of your original objection: not providing > a way for devices to skip waiting for their children. This really is a > separate issue from deciding whether or not to go async. For example, > your proposed patch makes PCI bridges async but doesn't allow them to > avoid waiting for children. IMO that's a good thing. > > The real issue is "blockage": synchronous devices preventing > possible concurrency among async devices. That's what you thought > making PCI bridges async would help. > > In general, blockage arises in suspend when you have an async child > with a synchronous parent. The parent has to wait for the child, which > might take a long time, thereby delaying other unrelated devices. Exactly, but the Linus' point seems to be that's going to be rare and we should be able to special case all of the interesting cases. > (This explains why you wanted to make PCI bridges async -- they are the > parents of USB controllers.) For resume it's the opposite: an async > parent with synchronous children. Is that really going to happen in practice? I mean, what would be the point? > Thus, while making PCI bridges async might make suspend faster, it probably > won't help much with resume speed. You'd have to make the children of USB > devices (SCSI hosts, TTYs, and so on) async. Depending on the order of > device registration, of course. > > Apart from all this, there's a glaring hole in the discussion so far. > You and Arjan may not have noticed it, but those of us still using > rotating media have to put up with disk resume times that are a factor > of 100 (!) larger than USB resume times. That's where the greatest > gains are to be found. I guess so. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-15 21:26 ` Rafael J. Wysocki @ 2009-12-15 22:01 ` Alan Stern 0 siblings, 0 replies; 235+ messages in thread From: Alan Stern @ 2009-12-15 22:01 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 15 Dec 2009, Rafael J. Wysocki wrote: > > Ideally we would figure out how to do the slow devices in parallel > > without interference from fast devices having unknown dependencies. > > Unfortunately this may not be possible. > > I really expect to see those "unknown dependencies" in the _noirq > suspend/resume phases and above. [The very fact they exist is worrisome, > because that's why we don't know why things work on one system and don't > work on another, although they appear to be very similar.] This is a good reason for keeping the _noirq phases synchronous. AFAIK they don't take long enough to be worth converting, so there's no loss. > > The real issue is "blockage": synchronous devices preventing > > possible concurrency among async devices. That's what you thought > > making PCI bridges async would help. > > > > In general, blockage arises in suspend when you have an async child > > with a synchronous parent. The parent has to wait for the child, which > > might take a long time, thereby delaying other unrelated devices. > > Exactly, but the Linus' point seems to be that's going to be rare and we > should be able to special case all of the interesting cases. Maybe that's true. Without seeing some examples of actual dpm_list contents, we can't tell. Can you post the interesting parts of the lists from some of your test machines? Maybe with a USB device or two plugged in? (The device names together with the names of their parents should be enough.) > > (This explains why you wanted to make PCI bridges async -- they are the > > parents of USB controllers.) For resume it's the opposite: an async > > parent with synchronous children. > > Is that really going to happen in practice? I mean, what would be the point? I don't know. It's all speculation until we see some actual lists. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-15 20:26 ` Alan Stern 2009-12-15 21:26 ` Rafael J. Wysocki @ 2009-12-15 21:54 ` Linus Torvalds 2009-12-15 22:27 ` Alan Stern 1 sibling, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-15 21:54 UTC (permalink / raw) To: Alan Stern Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 15 Dec 2009, Alan Stern wrote: > > Okay. This obviously implies that if/when cardbus bridges are > converted to async suspend/resume, the driver should make sure that the > lower-numbered devices wait for their sibling higher-numbered devices > to suspend (and vice versa for resume). Awkward though it may be. Yes. However, this is an excellent case where the whole "the device layer does things asynchronously" is really rather awkward. For cardbus, the nicest model really would be for the _driver_ to decide to do some things asynchronously, after having done some other things synchronously (to make sure of ordering). That said, I think we are ok for at least Yenta resume, because the really ordering-critical stuff we tend to do at "resume_early", which wouldn't be asynchronous anyway. But for an idea of what I'm talking about, look at the o2micro stuff in drivers/pcmcia/o2micro.h, and notice how it does certain things only for the "PCI_FUNC(..devfn) == 0" case. So I suspect that we _can_ just do cardbus bridges asynchronously too, but it really needs some care. I suspect to a first approximation we would want to do the easy cases first, and ignore cardbus as being "known to possibly have issues". > > Subtle? Hell yes. > > I don't disagree. However the subtlety lies mainly in the matter of > non-obvious dependencies. Yes. But we don't necessarily even _know_ those dependencies. The Cardbus ones I know about, but really only because I wrote much of that code initially when converting cardbus to look like the PCI bridge it largely is. But how many other cases like that do we have that we have perhaps never even hit, because we've never done anything out of order. > The ACPI relations are definitely something to worry about. It would > be a good idea, at an early stage, to add those dependencies > explicitly. I don't know enough about them to say more; perhaps Rafael > does. Quite frankly, I would really not want to do ACPI first at all. We already handle batteries specially, but any random system device? Don't touch it, is my suggestion. There is just too many ways it can fail. Don't tell me that things "should work" - we know for a fact that BIOS tables almost always have every single bug they could possibly have). > > And PCIE bridges? Should be safe these days, but it wasn't quite as > > obvious, because a PCIE bridge actually has a driver unlike a regular > > plain PCI-PCI bridge. > > > > Subtle, subtle. > > Indeed. Perhaps you were too hasty in suggesting that PCI bridges > should be async. Oh, yes. I would suggest that first we do _nothing_ async except for within just a single USB tree, and perhaps some individual drivers like the PS/2 keyboard controller (and do even that perhaps only for the PC version, which we know is on the southbridge and not anywhere else). If that ends up meaning that we block due to PCI bridges, so be it. I really would prefer baby steps over anything more complete. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-15 21:54 ` Linus Torvalds @ 2009-12-15 22:27 ` Alan Stern 0 siblings, 0 replies; 235+ messages in thread From: Alan Stern @ 2009-12-15 22:27 UTC (permalink / raw) To: Linus Torvalds Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tue, 15 Dec 2009, Linus Torvalds wrote: > On Tue, 15 Dec 2009, Alan Stern wrote: > > > > Okay. This obviously implies that if/when cardbus bridges are > > converted to async suspend/resume, the driver should make sure that the > > lower-numbered devices wait for their sibling higher-numbered devices > > to suspend (and vice versa for resume). Awkward though it may be. > > Yes. However, this is an excellent case where the whole "the device layer > does things asynchronously" is really rather awkward. > > For cardbus, the nicest model really would be for the _driver_ to decide > to do some things asynchronously, after having done some other things > synchronously (to make sure of ordering). Have you considered the possibility of augmenting the design to allow this? Perhaps reserve a particular return code from the suspend routine to mean that asynchronous operations are still underway, so the PM core shouldn't automatically do the complete_all(). > So I suspect that we _can_ just do cardbus bridges asynchronously too, but > it really needs some care. I suspect to a first approximation we would > want to do the easy cases first, and ignore cardbus as being "known to > possibly have issues". Certainly. Start with the easy things and leave harder devices like cardbus bridges for later. > > > Subtle? Hell yes. > > > > I don't disagree. However the subtlety lies mainly in the matter of > > non-obvious dependencies. > > Yes. But we don't necessarily even _know_ those dependencies. Yep. Both non-obvious and non-known. > The Cardbus ones I know about, but really only because I wrote much of > that code initially when converting cardbus to look like the PCI bridge it > largely is. But how many other cases like that do we have that we have > perhaps never even hit, because we've never done anything out of order. > > > The ACPI relations are definitely something to worry about. It would > > be a good idea, at an early stage, to add those dependencies > > explicitly. I don't know enough about them to say more; perhaps Rafael > > does. > > Quite frankly, I would really not want to do ACPI first at all. Dear me, no! I wasn't saying ACPI should be made async; I was saying that ACPI "shadow" devices should be made to wait for their async PCI counterparts. > > Indeed. Perhaps you were too hasty in suggesting that PCI bridges > > should be async. > > Oh, yes. I would suggest that first we do _nothing_ async except for > within just a single USB tree, and perhaps some individual drivers like > the PS/2 keyboard controller (and do even that perhaps only for the PC > version, which we know is on the southbridge and not anywhere else). > > If that ends up meaning that we block due to PCI bridges, so be it. I > really would prefer baby steps over anything more complete. Agreed. I'm not in any hurry. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-15 15:26 ` Linus Torvalds 2009-12-15 15:55 ` Alan Stern @ 2009-12-16 2:11 ` Rafael J. Wysocki 2009-12-16 6:40 ` Dmitry Torokhov ` (2 more replies) 1 sibling, 3 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-16 2:11 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Tuesday 15 December 2009, Linus Torvalds wrote: > > On Tue, 15 Dec 2009, Rafael J. Wysocki wrote: > > > > > > Give a real example that matters. > > > > I'll try. Let -> denote child-parent relationships and assume dpm_list looks > > like this: > > No. > > I mean something real - something like > > - if you run on a non-PC with two USB buses behind non-PCI controllers. > > - device xyz. > > > If this applies to _resume_ only, then I agree, but the Arjan's data clearly > > show that serio devices take much more time to suspend than USB. > > I mean in general - something where you actually have hard data that some > device really needs anythign more than my one-liner, and really _needs_ > some complex infrastructure. > > Not "let's imagine a case like xyz". As I said I would, I made some measurements. I measured the total time of suspending and resuming devices as shown by the code added by this patch: http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=commitdiff_plain;h=c1b8fc0a8bff7707c10f31f3d26bfa88e18ccd94;hp=087dbf5f079f1b55cbd3964c9ce71268473d5b67 on two boxes, HP nx6325 and MSI Wind U100 (hardware-wise they are quite different and the HP was running 64-bit kernel and user space). I took four cases into consideration: (1) synchronous suspend and resume (/sys/power/pm_async = 0) (2) asynchronous suspend and resume as introduced by the async branch at: http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=shortlog;h=refs/heads/async (3) asynchronous suspend and resume like in (2), but with your one-liner setting the power.async_suspend flag for PCI bridges on top (4) asynchronous suspend and resume like in (2), but with an extra patch that is appended on top For those tests I set power.async_suspend for all USB devices, all serio input devices, the ACPI battery and the USB PCI controllers (to see the impact of the one-liner, if any). I carried out 5 consecutive suspend-resume cycles (started from under X) on each box in each case, and the raw data are here (all times in milliseconds): http://www.sisk.pl/kernel/data/async-suspend.pdf The summarized data are below (the "big" numbers are averages and the +/- numbers are standard deviations, all in milliseconds): HP nx6325 MSI Wind U100 sync suspend 1482 (+/- 40) 1180 (+/- 24) sync resume 2955 (+/- 2) 3597 (+/- 25) async suspend 1553 (+/- 49) 1177 (+/- 32) async resume 2692 (+/- 326) 3556 (+/- 33) async+one-liner suspend 1600 (+/- 39) 1212 (+/- 41) async+one-liner resume 2692 (+/- 324) 3579 (+/- 24) async+extra suspend 1496 (+/- 37) 1217 (+/- 38) async+extra resume 1859 (+/- 114) 1923 (+/- 35) So, in my opinion, with the above set of "async" devices, it doesn't make sense to do async suspend at all, because the sync suspend is actually the fastest on both machines. However, it surely is worth doing async _resume_ with the extra patch appended below, because that allows us to save 1 second or more on both machines with respect to the sync case. The other variants of async resume also produce some time savings, but (on the nx6325) at the expense of huge fluctuations from one cycle to another (so they can actually be slower than the sync resume). Only the async resume with the extra patch is consistently better than the sync one. The impact of the one-liner is either negligible or slightly negative. Now, what does the extra patch do? Exactly the thing I was talking about, it starts all async suspends and resumes upfront. So, it looks like we both were wrong. I was wrong, because I thought the extra patch would help suspend, but not resume, while in fact it appears to help resume big time. You were wrong, because you thought that the one-liner would have positive impact, while in fact it doesn't. Concluding, at this point I'd opt for implementing asynchronous resume alone, _without_ asynchronous suspend, which is more complicated and doesn't really give us any time savings. At the same time, I'd implement the asynchronous resume in such a way that all of the async resume threads would be started before the synchronous suspend thread, because that would give us the best results. Rafael --- drivers/base/power/main.c | 48 +++++++++++++++++++++++++++++----------------- 1 file changed, 31 insertions(+), 17 deletions(-) Index: linux-2.6/drivers/base/power/main.c =================================================================== --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -523,14 +523,9 @@ static void async_resume(void *data, asy static int device_resume(struct device *dev) { - INIT_COMPLETION(dev->power.completion); - - if (pm_async_enabled && dev->power.async_suspend - && !pm_trace_is_enabled()) { - get_device(dev); - async_schedule(async_resume, dev); + if (dev->power.async_suspend && pm_async_enabled + && !pm_trace_is_enabled()) return 0; - } return __device_resume(dev, pm_transition, false); } @@ -545,14 +540,28 @@ static int device_resume(struct device * static void dpm_resume(pm_message_t state) { struct list_head list; + struct device *dev; ktime_t starttime = ktime_get(); INIT_LIST_HEAD(&list); mutex_lock(&dpm_list_mtx); pm_transition = state; - while (!list_empty(&dpm_list)) { - struct device *dev = to_device(dpm_list.next); + list_for_each_entry(dev, &dpm_list, power.entry) { + if (dev->power.status < DPM_OFF) + continue; + + INIT_COMPLETION(dev->power.completion); + + if (dev->power.async_suspend && pm_async_enabled + && !pm_trace_is_enabled()) { + get_device(dev); + async_schedule(async_resume, dev); + } + } + + while (!list_empty(&dpm_list)) { + dev = to_device(dpm_list.next); get_device(dev); if (dev->power.status >= DPM_OFF) { int error; @@ -809,13 +818,8 @@ static void async_suspend(void *data, as static int device_suspend(struct device *dev) { - INIT_COMPLETION(dev->power.completion); - - if (pm_async_enabled && dev->power.async_suspend) { - get_device(dev); - async_schedule(async_suspend, dev); + if (pm_async_enabled && dev->power.async_suspend) return 0; - } return __device_suspend(dev, pm_transition, false); } @@ -827,6 +831,7 @@ static int device_suspend(struct device static int dpm_suspend(pm_message_t state) { struct list_head list; + struct device *dev; ktime_t starttime = ktime_get(); int error = 0; @@ -834,9 +839,18 @@ static int dpm_suspend(pm_message_t stat mutex_lock(&dpm_list_mtx); pm_transition = state; async_error = 0; - while (!list_empty(&dpm_list)) { - struct device *dev = to_device(dpm_list.prev); + list_for_each_entry_reverse(dev, &dpm_list, power.entry) { + INIT_COMPLETION(dev->power.completion); + + if (pm_async_enabled && dev->power.async_suspend) { + get_device(dev); + async_schedule(async_suspend, dev); + } + } + + while (!list_empty(&dpm_list)) { + dev = to_device(dpm_list.prev); get_device(dev); mutex_unlock(&dpm_list_mtx); ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-16 2:11 ` Rafael J. Wysocki @ 2009-12-16 6:40 ` Dmitry Torokhov 2009-12-18 22:43 ` Rafael J. Wysocki 2009-12-16 15:22 ` Alan Stern 2009-12-16 15:47 ` Linus Torvalds 2 siblings, 1 reply; 235+ messages in thread From: Dmitry Torokhov @ 2009-12-16 6:40 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wed, Dec 16, 2009 at 03:11:05AM +0100, Rafael J. Wysocki wrote: > On Tuesday 15 December 2009, Linus Torvalds wrote: > > > > On Tue, 15 Dec 2009, Rafael J. Wysocki wrote: > > > > > > > > Give a real example that matters. > > > > > > I'll try. Let -> denote child-parent relationships and assume dpm_list looks > > > like this: > > > > No. > > > > I mean something real - something like > > > > - if you run on a non-PC with two USB buses behind non-PCI controllers. > > > > - device xyz. > > > > > If this applies to _resume_ only, then I agree, but the Arjan's data clearly > > > show that serio devices take much more time to suspend than USB. > > > > I mean in general - something where you actually have hard data that some > > device really needs anythign more than my one-liner, and really _needs_ > > some complex infrastructure. > > > > Not "let's imagine a case like xyz". > > As I said I would, I made some measurements. > > I measured the total time of suspending and resuming devices as shown by the > code added by this patch: > http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=commitdiff_plain;h=c1b8fc0a8bff7707c10f31f3d26bfa88e18ccd94;hp=087dbf5f079f1b55cbd3964c9ce71268473d5b67 > on two boxes, HP nx6325 and MSI Wind U100 (hardware-wise they are quite > different and the HP was running 64-bit kernel and user space). > > I took four cases into consideration: > (1) synchronous suspend and resume (/sys/power/pm_async = 0) > (2) asynchronous suspend and resume as introduced by the async branch at: > http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=shortlog;h=refs/heads/async > (3) asynchronous suspend and resume like in (2), but with your one-liner setting > the power.async_suspend flag for PCI bridges on top > (4) asynchronous suspend and resume like in (2), but with an extra patch that > is appended on top > > For those tests I set power.async_suspend for all USB devices, all serio input > devices, the ACPI battery and the USB PCI controllers (to see the impact of the > one-liner, if any). > > I carried out 5 consecutive suspend-resume cycles (started from under X) on > each box in each case, and the raw data are here (all times in milliseconds): > http://www.sisk.pl/kernel/data/async-suspend.pdf > > The summarized data are below (the "big" numbers are averages and the +/- > numbers are standard deviations, all in milliseconds): > > HP nx6325 MSI Wind U100 > > sync suspend 1482 (+/- 40) 1180 (+/- 24) > sync resume 2955 (+/- 2) 3597 (+/- 25) > > async suspend 1553 (+/- 49) 1177 (+/- 32) > async resume 2692 (+/- 326) 3556 (+/- 33) > > async+one-liner suspend 1600 (+/- 39) 1212 (+/- 41) > async+one-liner resume 2692 (+/- 324) 3579 (+/- 24) > > async+extra suspend 1496 (+/- 37) 1217 (+/- 38) > async+extra resume 1859 (+/- 114) 1923 (+/- 35) > > So, in my opinion, with the above set of "async" devices, it doesn't > make sense to do async suspend at all, because the sync suspend is actually > the fastest on both machines. I think the async suspend is not asynchronous enough then - what kind of time do you get if you simply comment out call to psmouse_reset() in drivers/input/mouse/psmouse-base.c:psmouse_cleanup()? (Just for testing purposes only, I don't think we want to do that by default.) -- Dmitry ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-16 6:40 ` Dmitry Torokhov @ 2009-12-18 22:43 ` Rafael J. Wysocki 2009-12-19 19:59 ` Dmitry Torokhov 0 siblings, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-18 22:43 UTC (permalink / raw) To: Dmitry Torokhov Cc: Linus Torvalds, Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wednesday 16 December 2009, Dmitry Torokhov wrote: > On Wed, Dec 16, 2009 at 03:11:05AM +0100, Rafael J. Wysocki wrote: > > On Tuesday 15 December 2009, Linus Torvalds wrote: > > > > > > On Tue, 15 Dec 2009, Rafael J. Wysocki wrote: > > > > > > > > > > Give a real example that matters. > > > > > > > > I'll try. Let -> denote child-parent relationships and assume dpm_list looks > > > > like this: > > > > > > No. > > > > > > I mean something real - something like > > > > > > - if you run on a non-PC with two USB buses behind non-PCI controllers. > > > > > > - device xyz. > > > > > > > If this applies to _resume_ only, then I agree, but the Arjan's data clearly > > > > show that serio devices take much more time to suspend than USB. > > > > > > I mean in general - something where you actually have hard data that some > > > device really needs anythign more than my one-liner, and really _needs_ > > > some complex infrastructure. > > > > > > Not "let's imagine a case like xyz". > > > > As I said I would, I made some measurements. > > > > I measured the total time of suspending and resuming devices as shown by the > > code added by this patch: > > http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=commitdiff_plain;h=c1b8fc0a8bff7707c10f31f3d26bfa88e18ccd94;hp=087dbf5f079f1b55cbd3964c9ce71268473d5b67 > > on two boxes, HP nx6325 and MSI Wind U100 (hardware-wise they are quite > > different and the HP was running 64-bit kernel and user space). > > > > I took four cases into consideration: > > (1) synchronous suspend and resume (/sys/power/pm_async = 0) > > (2) asynchronous suspend and resume as introduced by the async branch at: > > http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=shortlog;h=refs/heads/async > > (3) asynchronous suspend and resume like in (2), but with your one-liner setting > > the power.async_suspend flag for PCI bridges on top > > (4) asynchronous suspend and resume like in (2), but with an extra patch that > > is appended on top > > > > For those tests I set power.async_suspend for all USB devices, all serio input > > devices, the ACPI battery and the USB PCI controllers (to see the impact of the > > one-liner, if any). > > > > I carried out 5 consecutive suspend-resume cycles (started from under X) on > > each box in each case, and the raw data are here (all times in milliseconds): > > http://www.sisk.pl/kernel/data/async-suspend.pdf > > > > The summarized data are below (the "big" numbers are averages and the +/- > > numbers are standard deviations, all in milliseconds): > > > > HP nx6325 MSI Wind U100 > > > > sync suspend 1482 (+/- 40) 1180 (+/- 24) > > sync resume 2955 (+/- 2) 3597 (+/- 25) > > > > async suspend 1553 (+/- 49) 1177 (+/- 32) > > async resume 2692 (+/- 326) 3556 (+/- 33) > > > > async+one-liner suspend 1600 (+/- 39) 1212 (+/- 41) > > async+one-liner resume 2692 (+/- 324) 3579 (+/- 24) > > > > async+extra suspend 1496 (+/- 37) 1217 (+/- 38) > > async+extra resume 1859 (+/- 114) 1923 (+/- 35) > > > > So, in my opinion, with the above set of "async" devices, it doesn't > > make sense to do async suspend at all, because the sync suspend is actually > > the fastest on both machines. > > I think the async suspend is not asynchronous enough then - what kind of > time do you get if you simply comment out call to psmouse_reset() in > drivers/input/mouse/psmouse-base.c:psmouse_cleanup()? (Just for testing > purposes only, I don't think we want to do that by default.) The problem apparently is that the i8042 suspend/resume is synchronous. Do you think it's safe to mark it as asynchronous? Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-18 22:43 ` Rafael J. Wysocki @ 2009-12-19 19:59 ` Dmitry Torokhov 2009-12-19 21:33 ` Rafael J. Wysocki 0 siblings, 1 reply; 235+ messages in thread From: Dmitry Torokhov @ 2009-12-19 19:59 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Fri, Dec 18, 2009 at 11:43:29PM +0100, Rafael J. Wysocki wrote: > On Wednesday 16 December 2009, Dmitry Torokhov wrote: > > On Wed, Dec 16, 2009 at 03:11:05AM +0100, Rafael J. Wysocki wrote: > > > On Tuesday 15 December 2009, Linus Torvalds wrote: > > > > > > > > On Tue, 15 Dec 2009, Rafael J. Wysocki wrote: > > > > > > > > > > > > Give a real example that matters. > > > > > > > > > > I'll try. Let -> denote child-parent relationships and assume dpm_list looks > > > > > like this: > > > > > > > > No. > > > > > > > > I mean something real - something like > > > > > > > > - if you run on a non-PC with two USB buses behind non-PCI controllers. > > > > > > > > - device xyz. > > > > > > > > > If this applies to _resume_ only, then I agree, but the Arjan's data clearly > > > > > show that serio devices take much more time to suspend than USB. > > > > > > > > I mean in general - something where you actually have hard data that some > > > > device really needs anythign more than my one-liner, and really _needs_ > > > > some complex infrastructure. > > > > > > > > Not "let's imagine a case like xyz". > > > > > > As I said I would, I made some measurements. > > > > > > I measured the total time of suspending and resuming devices as shown by the > > > code added by this patch: > > > http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=commitdiff_plain;h=c1b8fc0a8bff7707c10f31f3d26bfa88e18ccd94;hp=087dbf5f079f1b55cbd3964c9ce71268473d5b67 > > > on two boxes, HP nx6325 and MSI Wind U100 (hardware-wise they are quite > > > different and the HP was running 64-bit kernel and user space). > > > > > > I took four cases into consideration: > > > (1) synchronous suspend and resume (/sys/power/pm_async = 0) > > > (2) asynchronous suspend and resume as introduced by the async branch at: > > > http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=shortlog;h=refs/heads/async > > > (3) asynchronous suspend and resume like in (2), but with your one-liner setting > > > the power.async_suspend flag for PCI bridges on top > > > (4) asynchronous suspend and resume like in (2), but with an extra patch that > > > is appended on top > > > > > > For those tests I set power.async_suspend for all USB devices, all serio input > > > devices, the ACPI battery and the USB PCI controllers (to see the impact of the > > > one-liner, if any). > > > > > > I carried out 5 consecutive suspend-resume cycles (started from under X) on > > > each box in each case, and the raw data are here (all times in milliseconds): > > > http://www.sisk.pl/kernel/data/async-suspend.pdf > > > > > > The summarized data are below (the "big" numbers are averages and the +/- > > > numbers are standard deviations, all in milliseconds): > > > > > > HP nx6325 MSI Wind U100 > > > > > > sync suspend 1482 (+/- 40) 1180 (+/- 24) > > > sync resume 2955 (+/- 2) 3597 (+/- 25) > > > > > > async suspend 1553 (+/- 49) 1177 (+/- 32) > > > async resume 2692 (+/- 326) 3556 (+/- 33) > > > > > > async+one-liner suspend 1600 (+/- 39) 1212 (+/- 41) > > > async+one-liner resume 2692 (+/- 324) 3579 (+/- 24) > > > > > > async+extra suspend 1496 (+/- 37) 1217 (+/- 38) > > > async+extra resume 1859 (+/- 114) 1923 (+/- 35) > > > > > > So, in my opinion, with the above set of "async" devices, it doesn't > > > make sense to do async suspend at all, because the sync suspend is actually > > > the fastest on both machines. > > > > I think the async suspend is not asynchronous enough then - what kind of > > time do you get if you simply comment out call to psmouse_reset() in > > drivers/input/mouse/psmouse-base.c:psmouse_cleanup()? (Just for testing > > purposes only, I don't think we want to do that by default.) > > The problem apparently is that the i8042 suspend/resume is synchronous. > > Do you think it's safe to mark it as asynchronous? > Umm.. there lie dragons. There is an implicit relationship between i8042 and PNP/ACPI devices representing keyboard and mouse ports, and I am not sure how happy i8042 (and most importantly the BIOS) will be if they get shut down before i8042. Also there is EC which is in theory independent but in practice not so much. -- Dmitry ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-19 19:59 ` Dmitry Torokhov @ 2009-12-19 21:33 ` Rafael J. Wysocki 2009-12-19 22:29 ` Rafael J. Wysocki 2009-12-19 22:47 ` Dmitry Torokhov 0 siblings, 2 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-19 21:33 UTC (permalink / raw) To: Dmitry Torokhov Cc: Linus Torvalds, Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Saturday 19 December 2009, Dmitry Torokhov wrote: > On Fri, Dec 18, 2009 at 11:43:29PM +0100, Rafael J. Wysocki wrote: > > On Wednesday 16 December 2009, Dmitry Torokhov wrote: > > > On Wed, Dec 16, 2009 at 03:11:05AM +0100, Rafael J. Wysocki wrote: > > > > On Tuesday 15 December 2009, Linus Torvalds wrote: > > > > > > > > > > On Tue, 15 Dec 2009, Rafael J. Wysocki wrote: > > > > > > > > > > > > > > Give a real example that matters. > > > > > > > > > > > > I'll try. Let -> denote child-parent relationships and assume dpm_list looks > > > > > > like this: > > > > > > > > > > No. > > > > > > > > > > I mean something real - something like > > > > > > > > > > - if you run on a non-PC with two USB buses behind non-PCI controllers. > > > > > > > > > > - device xyz. > > > > > > > > > > > If this applies to _resume_ only, then I agree, but the Arjan's data clearly > > > > > > show that serio devices take much more time to suspend than USB. > > > > > > > > > > I mean in general - something where you actually have hard data that some > > > > > device really needs anythign more than my one-liner, and really _needs_ > > > > > some complex infrastructure. > > > > > > > > > > Not "let's imagine a case like xyz". > > > > > > > > As I said I would, I made some measurements. > > > > > > > > I measured the total time of suspending and resuming devices as shown by the > > > > code added by this patch: > > > > http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=commitdiff_plain;h=c1b8fc0a8bff7707c10f31f3d26bfa88e18ccd94;hp=087dbf5f079f1b55cbd3964c9ce71268473d5b67 > > > > on two boxes, HP nx6325 and MSI Wind U100 (hardware-wise they are quite > > > > different and the HP was running 64-bit kernel and user space). > > > > > > > > I took four cases into consideration: > > > > (1) synchronous suspend and resume (/sys/power/pm_async = 0) > > > > (2) asynchronous suspend and resume as introduced by the async branch at: > > > > http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=shortlog;h=refs/heads/async > > > > (3) asynchronous suspend and resume like in (2), but with your one-liner setting > > > > the power.async_suspend flag for PCI bridges on top > > > > (4) asynchronous suspend and resume like in (2), but with an extra patch that > > > > is appended on top > > > > > > > > For those tests I set power.async_suspend for all USB devices, all serio input > > > > devices, the ACPI battery and the USB PCI controllers (to see the impact of the > > > > one-liner, if any). > > > > > > > > I carried out 5 consecutive suspend-resume cycles (started from under X) on > > > > each box in each case, and the raw data are here (all times in milliseconds): > > > > http://www.sisk.pl/kernel/data/async-suspend.pdf > > > > > > > > The summarized data are below (the "big" numbers are averages and the +/- > > > > numbers are standard deviations, all in milliseconds): > > > > > > > > HP nx6325 MSI Wind U100 > > > > > > > > sync suspend 1482 (+/- 40) 1180 (+/- 24) > > > > sync resume 2955 (+/- 2) 3597 (+/- 25) > > > > > > > > async suspend 1553 (+/- 49) 1177 (+/- 32) > > > > async resume 2692 (+/- 326) 3556 (+/- 33) > > > > > > > > async+one-liner suspend 1600 (+/- 39) 1212 (+/- 41) > > > > async+one-liner resume 2692 (+/- 324) 3579 (+/- 24) > > > > > > > > async+extra suspend 1496 (+/- 37) 1217 (+/- 38) > > > > async+extra resume 1859 (+/- 114) 1923 (+/- 35) > > > > > > > > So, in my opinion, with the above set of "async" devices, it doesn't > > > > make sense to do async suspend at all, because the sync suspend is actually > > > > the fastest on both machines. > > > > > > I think the async suspend is not asynchronous enough then - what kind of > > > time do you get if you simply comment out call to psmouse_reset() in > > > drivers/input/mouse/psmouse-base.c:psmouse_cleanup()? (Just for testing > > > purposes only, I don't think we want to do that by default.) > > > > The problem apparently is that the i8042 suspend/resume is synchronous. > > > > Do you think it's safe to mark it as asynchronous? > > > > Umm.. there lie dragons. There is an implicit relationship between i8042 > and PNP/ACPI devices representing keyboard and mouse ports, and I am not > sure how happy i8042 (and most importantly the BIOS) will be if they get > shut down before i8042. Also there is EC which is in theory independent > but in practice not so much. I see. Is this possible to identify ACPI devices that should wait for the i8042 suspend and that should be waited for by it on resume? Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-19 21:33 ` Rafael J. Wysocki @ 2009-12-19 22:29 ` Rafael J. Wysocki 2009-12-19 22:43 ` Dmitry Torokhov 2009-12-19 22:47 ` Dmitry Torokhov 1 sibling, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-19 22:29 UTC (permalink / raw) To: Dmitry Torokhov Cc: Linus Torvalds, Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Saturday 19 December 2009, Rafael J. Wysocki wrote: > On Saturday 19 December 2009, Dmitry Torokhov wrote: > > On Fri, Dec 18, 2009 at 11:43:29PM +0100, Rafael J. Wysocki wrote: > > > On Wednesday 16 December 2009, Dmitry Torokhov wrote: > > > > On Wed, Dec 16, 2009 at 03:11:05AM +0100, Rafael J. Wysocki wrote: > > > > > On Tuesday 15 December 2009, Linus Torvalds wrote: > > > > > > > > > > > > On Tue, 15 Dec 2009, Rafael J. Wysocki wrote: > > > > > > > > > > > > > > > > Give a real example that matters. > > > > > > > > > > > > > > I'll try. Let -> denote child-parent relationships and assume dpm_list looks > > > > > > > like this: > > > > > > > > > > > > No. > > > > > > > > > > > > I mean something real - something like > > > > > > > > > > > > - if you run on a non-PC with two USB buses behind non-PCI controllers. > > > > > > > > > > > > - device xyz. > > > > > > > > > > > > > If this applies to _resume_ only, then I agree, but the Arjan's data clearly > > > > > > > show that serio devices take much more time to suspend than USB. > > > > > > > > > > > > I mean in general - something where you actually have hard data that some > > > > > > device really needs anythign more than my one-liner, and really _needs_ > > > > > > some complex infrastructure. > > > > > > > > > > > > Not "let's imagine a case like xyz". > > > > > > > > > > As I said I would, I made some measurements. > > > > > > > > > > I measured the total time of suspending and resuming devices as shown by the > > > > > code added by this patch: > > > > > http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=commitdiff_plain;h=c1b8fc0a8bff7707c10f31f3d26bfa88e18ccd94;hp=087dbf5f079f1b55cbd3964c9ce71268473d5b67 > > > > > on two boxes, HP nx6325 and MSI Wind U100 (hardware-wise they are quite > > > > > different and the HP was running 64-bit kernel and user space). > > > > > > > > > > I took four cases into consideration: > > > > > (1) synchronous suspend and resume (/sys/power/pm_async = 0) > > > > > (2) asynchronous suspend and resume as introduced by the async branch at: > > > > > http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=shortlog;h=refs/heads/async > > > > > (3) asynchronous suspend and resume like in (2), but with your one-liner setting > > > > > the power.async_suspend flag for PCI bridges on top > > > > > (4) asynchronous suspend and resume like in (2), but with an extra patch that > > > > > is appended on top > > > > > > > > > > For those tests I set power.async_suspend for all USB devices, all serio input > > > > > devices, the ACPI battery and the USB PCI controllers (to see the impact of the > > > > > one-liner, if any). > > > > > > > > > > I carried out 5 consecutive suspend-resume cycles (started from under X) on > > > > > each box in each case, and the raw data are here (all times in milliseconds): > > > > > http://www.sisk.pl/kernel/data/async-suspend.pdf > > > > > > > > > > The summarized data are below (the "big" numbers are averages and the +/- > > > > > numbers are standard deviations, all in milliseconds): > > > > > > > > > > HP nx6325 MSI Wind U100 > > > > > > > > > > sync suspend 1482 (+/- 40) 1180 (+/- 24) > > > > > sync resume 2955 (+/- 2) 3597 (+/- 25) > > > > > > > > > > async suspend 1553 (+/- 49) 1177 (+/- 32) > > > > > async resume 2692 (+/- 326) 3556 (+/- 33) > > > > > > > > > > async+one-liner suspend 1600 (+/- 39) 1212 (+/- 41) > > > > > async+one-liner resume 2692 (+/- 324) 3579 (+/- 24) > > > > > > > > > > async+extra suspend 1496 (+/- 37) 1217 (+/- 38) > > > > > async+extra resume 1859 (+/- 114) 1923 (+/- 35) > > > > > > > > > > So, in my opinion, with the above set of "async" devices, it doesn't > > > > > make sense to do async suspend at all, because the sync suspend is actually > > > > > the fastest on both machines. > > > > > > > > I think the async suspend is not asynchronous enough then - what kind of > > > > time do you get if you simply comment out call to psmouse_reset() in > > > > drivers/input/mouse/psmouse-base.c:psmouse_cleanup()? (Just for testing > > > > purposes only, I don't think we want to do that by default.) > > > > > > The problem apparently is that the i8042 suspend/resume is synchronous. > > > > > > Do you think it's safe to mark it as asynchronous? > > > > > > > Umm.. there lie dragons. There is an implicit relationship between i8042 > > and PNP/ACPI devices representing keyboard and mouse ports, and I am not > > sure how happy i8042 (and most importantly the BIOS) will be if they get > > shut down before i8042. Also there is EC which is in theory independent > > but in practice not so much. > > I see. > > Is this possible to identify ACPI devices that should wait for the i8042 > suspend and that should be waited for by it on resume? Wait, if you look at the logs at http://www.sisk.pl/kernel/data/nx6325/ http://www.sisk.pl/kernel/data/wind/ you'll see that the i8042 suspend is called before any ACPI devices are suspended anyway. In fact, it is suspended right after its serio children which is very early in the suspend sequence. So, it seems, if there were any problems with i8042 vs ACPI, we'd experience them anyway. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-19 22:29 ` Rafael J. Wysocki @ 2009-12-19 22:43 ` Dmitry Torokhov 0 siblings, 0 replies; 235+ messages in thread From: Dmitry Torokhov @ 2009-12-19 22:43 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Dec 19, 2009, at 2:29 PM, "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > On Saturday 19 December 2009, Rafael J. Wysocki wrote: >> On Saturday 19 December 2009, Dmitry Torokhov wrote: >>> On Fri, Dec 18, 2009 at 11:43:29PM +0100, Rafael J. Wysocki wrote: >>>> On Wednesday 16 December 2009, Dmitry Torokhov wrote: >>>>> On Wed, Dec 16, 2009 at 03:11:05AM +0100, Rafael J. Wysocki wrote: >>>>>> On Tuesday 15 December 2009, Linus Torvalds wrote: >>>>>>> >>>>>>> On Tue, 15 Dec 2009, Rafael J. Wysocki wrote: >>>>>>>>> >>>>>>>>> Give a real example that matters. >>>>>>>> >>>>>>>> I'll try. Let -> denote child-parent relationships and >>>>>>>> assume dpm_list looks >>>>>>>> like this: >>>>>>> >>>>>>> No. >>>>>>> >>>>>>> I mean something real - something like >>>>>>> >>>>>>> - if you run on a non-PC with two USB buses behind non-PCI >>>>>>> controllers. >>>>>>> >>>>>>> - device xyz. >>>>>>> >>>>>>>> If this applies to _resume_ only, then I agree, but the >>>>>>>> Arjan's data clearly >>>>>>>> show that serio devices take much more time to suspend than >>>>>>>> USB. >>>>>>> >>>>>>> I mean in general - something where you actually have hard >>>>>>> data that some >>>>>>> device really needs anythign more than my one-liner, and >>>>>>> really _needs_ >>>>>>> some complex infrastructure. >>>>>>> >>>>>>> Not "let's imagine a case like xyz". >>>>>> >>>>>> As I said I would, I made some measurements. >>>>>> >>>>>> I measured the total time of suspending and resuming devices as >>>>>> shown by the >>>>>> code added by this patch: >>>>>> http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=commitdiff_plain;h=c1b8fc0a8bff7707c10f31f3d26bfa88e18ccd94;hp=087dbf5f079f1b55cbd3964c9ce71268473d5b67 >>>>>> on two boxes, HP nx6325 and MSI Wind U100 (hardware-wise they >>>>>> are quite >>>>>> different and the HP was running 64-bit kernel and user space). >>>>>> >>>>>> I took four cases into consideration: >>>>>> (1) synchronous suspend and resume (/sys/power/pm_async = 0) >>>>>> (2) asynchronous suspend and resume as introduced by the async >>>>>> branch at: >>>>>> http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=shortlog;h=refs/heads/async >>>>>> (3) asynchronous suspend and resume like in (2), but with your >>>>>> one-liner setting >>>>>> the power.async_suspend flag for PCI bridges on top >>>>>> (4) asynchronous suspend and resume like in (2), but with an >>>>>> extra patch that >>>>>> is appended on top >>>>>> >>>>>> For those tests I set power.async_suspend for all USB devices, >>>>>> all serio input >>>>>> devices, the ACPI battery and the USB PCI controllers (to see >>>>>> the impact of the >>>>>> one-liner, if any). >>>>>> >>>>>> I carried out 5 consecutive suspend-resume cycles (started from >>>>>> under X) on >>>>>> each box in each case, and the raw data are here (all times in >>>>>> milliseconds): >>>>>> http://www.sisk.pl/kernel/data/async-suspend.pdf >>>>>> >>>>>> The summarized data are below (the "big" numbers are averages >>>>>> and the +/- >>>>>> numbers are standard deviations, all in milliseconds): >>>>>> >>>>>> HP nx6325 MSI Wind U100 >>>>>> >>>>>> sync suspend 1482 (+/- 40) 1180 (+/- 24) >>>>>> sync resume 2955 (+/- 2) 3597 (+/- 25) >>>>>> >>>>>> async suspend 1553 (+/- 49) 1177 (+/- 32) >>>>>> async resume 2692 (+/- 326) 3556 (+/- 33) >>>>>> >>>>>> async+one-liner suspend 1600 (+/- 39) 1212 (+/- 41) >>>>>> async+one-liner resume 2692 (+/- 324) 3579 (+/- 24) >>>>>> >>>>>> async+extra suspend 1496 (+/- 37) 1217 (+/- 38) >>>>>> async+extra resume 1859 (+/- 114) 1923 (+/- 35) >>>>>> >>>>>> So, in my opinion, with the above set of "async" devices, it >>>>>> doesn't >>>>>> make sense to do async suspend at all, because the sync suspend >>>>>> is actually >>>>>> the fastest on both machines. >>>>> >>>>> I think the async suspend is not asynchronous enough then - what >>>>> kind of >>>>> time do you get if you simply comment out call to psmouse_reset >>>>> () in >>>>> drivers/input/mouse/psmouse-base.c:psmouse_cleanup()? (Just for >>>>> testing >>>>> purposes only, I don't think we want to do that by default.) >>>> >>>> The problem apparently is that the i8042 suspend/resume is >>>> synchronous. >>>> >>>> Do you think it's safe to mark it as asynchronous? >>>> >>> >>> Umm.. there lie dragons. There is an implicit relationship between >>> i8042 >>> and PNP/ACPI devices representing keyboard and mouse ports, and I >>> am not >>> sure how happy i8042 (and most importantly the BIOS) will be if >>> they get >>> shut down before i8042. Also there is EC which is in theory >>> independent >>> but in practice not so much. >> >> I see. >> >> Is this possible to identify ACPI devices that should wait for the >> i8042 >> suspend and that should be waited for by it on resume? > > Wait, if you look at the logs at > > http://www.sisk.pl/kernel/data/nx6325/ > http://www.sisk.pl/kernel/data/wind/ > > you'll see that the i8042 suspend is called before any ACPI devices > are > suspended anyway. In fact, it is suspended right after its serio > children > which is very early in the suspend sequence. Right, and we do want to "suspend" i8042 (well, reset to the initial state we found it at bootup) before touching ACPI. If i8042 is async, given the fact that psmouse reset takes a long time, it is possible that we start suspending PNP before we are done with i8042. -- > Dmitry ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-19 21:33 ` Rafael J. Wysocki 2009-12-19 22:29 ` Rafael J. Wysocki @ 2009-12-19 22:47 ` Dmitry Torokhov 2009-12-19 23:10 ` Rafael J. Wysocki 1 sibling, 1 reply; 235+ messages in thread From: Dmitry Torokhov @ 2009-12-19 22:47 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Dec 19, 2009, at 1:33 PM, "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > On Saturday 19 December 2009, Dmitry Torokhov wrote: >> On Fri, Dec 18, 2009 at 11:43:29PM +0100, Rafael J. Wysocki wrote: >>> On Wednesday 16 December 2009, Dmitry Torokhov wrote: >>>> On Wed, Dec 16, 2009 at 03:11:05AM +0100, Rafael J. Wysocki wrote: >>>>> On Tuesday 15 December 2009, Linus Torvalds wrote: >>>>>> >>>>>> On Tue, 15 Dec 2009, Rafael J. Wysocki wrote: >>>>>>>> >>>>>>>> Give a real example that matters. >>>>>>> >>>>>>> I'll try. Let -> denote child-parent relationships and assume >>>>>>> dpm_list looks >>>>>>> like this: >>>>>> >>>>>> No. >>>>>> >>>>>> I mean something real - something like >>>>>> >>>>>> - if you run on a non-PC with two USB buses behind non-PCI >>>>>> controllers. >>>>>> >>>>>> - device xyz. >>>>>> >>>>>>> If this applies to _resume_ only, then I agree, but the >>>>>>> Arjan's data clearly >>>>>>> show that serio devices take much more time to suspend than USB. >>>>>> >>>>>> I mean in general - something where you actually have hard data >>>>>> that some >>>>>> device really needs anythign more than my one-liner, and really >>>>>> _needs_ >>>>>> some complex infrastructure. >>>>>> >>>>>> Not "let's imagine a case like xyz". >>>>> >>>>> As I said I would, I made some measurements. >>>>> >>>>> I measured the total time of suspending and resuming devices as >>>>> shown by the >>>>> code added by this patch: >>>>> http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=commitdiff_plain;h=c1b8fc0a8bff7707c10f31f3d26bfa88e18ccd94;hp=087dbf5f079f1b55cbd3964c9ce71268473d5b67 >>>>> on two boxes, HP nx6325 and MSI Wind U100 (hardware-wise they >>>>> are quite >>>>> different and the HP was running 64-bit kernel and user space). >>>>> >>>>> I took four cases into consideration: >>>>> (1) synchronous suspend and resume (/sys/power/pm_async = 0) >>>>> (2) asynchronous suspend and resume as introduced by the async >>>>> branch at: >>>>> http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=shortlog;h=refs/heads/async >>>>> (3) asynchronous suspend and resume like in (2), but with your >>>>> one-liner setting >>>>> the power.async_suspend flag for PCI bridges on top >>>>> (4) asynchronous suspend and resume like in (2), but with an >>>>> extra patch that >>>>> is appended on top >>>>> >>>>> For those tests I set power.async_suspend for all USB devices, >>>>> all serio input >>>>> devices, the ACPI battery and the USB PCI controllers (to see >>>>> the impact of the >>>>> one-liner, if any). >>>>> >>>>> I carried out 5 consecutive suspend-resume cycles (started from >>>>> under X) on >>>>> each box in each case, and the raw data are here (all times in >>>>> milliseconds): >>>>> http://www.sisk.pl/kernel/data/async-suspend.pdf >>>>> >>>>> The summarized data are below (the "big" numbers are averages >>>>> and the +/- >>>>> numbers are standard deviations, all in milliseconds): >>>>> >>>>> HP nx6325 MSI Wind U100 >>>>> >>>>> sync suspend 1482 (+/- 40) 1180 (+/- 24) >>>>> sync resume 2955 (+/- 2) 3597 (+/- 25) >>>>> >>>>> async suspend 1553 (+/- 49) 1177 (+/- 32) >>>>> async resume 2692 (+/- 326) 3556 (+/- 33) >>>>> >>>>> async+one-liner suspend 1600 (+/- 39) 1212 (+/- 41) >>>>> async+one-liner resume 2692 (+/- 324) 3579 (+/- 24) >>>>> >>>>> async+extra suspend 1496 (+/- 37) 1217 (+/- 38) >>>>> async+extra resume 1859 (+/- 114) 1923 (+/- 35) >>>>> >>>>> So, in my opinion, with the above set of "async" devices, it >>>>> doesn't >>>>> make sense to do async suspend at all, because the sync suspend >>>>> is actually >>>>> the fastest on both machines. >>>> >>>> I think the async suspend is not asynchronous enough then - what >>>> kind of >>>> time do you get if you simply comment out call to psmouse_reset() >>>> in >>>> drivers/input/mouse/psmouse-base.c:psmouse_cleanup()? (Just for >>>> testing >>>> purposes only, I don't think we want to do that by default.) >>> >>> The problem apparently is that the i8042 suspend/resume is >>> synchronous. >>> >>> Do you think it's safe to mark it as asynchronous? >>> >> >> Umm.. there lie dragons. There is an implicit relationship between >> i8042 >> and PNP/ACPI devices representing keyboard and mouse ports, and I >> am not >> sure how happy i8042 (and most importantly the BIOS) will be if >> they get >> shut down before i8042. Also there is EC which is in theory >> independent >> but in practice not so much. > > I see. > > Is this possible to identify ACPI devices that should wait for the > i8042 > suspend and that should be waited for by it on resume? We could try to add some dependencies while discovering PNP to get KBC addresses in i8042 but we need tomake sure we do it even in presence of i8042.nopnp. -- Dmitry ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-19 22:47 ` Dmitry Torokhov @ 2009-12-19 23:10 ` Rafael J. Wysocki 2009-12-19 23:22 ` Dmitry Torokhov 2009-12-19 23:23 ` Linus Torvalds 0 siblings, 2 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-19 23:10 UTC (permalink / raw) To: Dmitry Torokhov Cc: Linus Torvalds, Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Saturday 19 December 2009, Dmitry Torokhov wrote: > On Dec 19, 2009, at 1:33 PM, "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > > On Saturday 19 December 2009, Dmitry Torokhov wrote: > >> On Fri, Dec 18, 2009 at 11:43:29PM +0100, Rafael J. Wysocki wrote: > >>> On Wednesday 16 December 2009, Dmitry Torokhov wrote: > >>>> On Wed, Dec 16, 2009 at 03:11:05AM +0100, Rafael J. Wysocki wrote: > >>>>> On Tuesday 15 December 2009, Linus Torvalds wrote: > >>>>>> > >>>>>> On Tue, 15 Dec 2009, Rafael J. Wysocki wrote: > >>>>>>>> > >>>>>>>> Give a real example that matters. > >>>>>>> > >>>>>>> I'll try. Let -> denote child-parent relationships and assume > >>>>>>> dpm_list looks > >>>>>>> like this: > >>>>>> > >>>>>> No. > >>>>>> > >>>>>> I mean something real - something like > >>>>>> > >>>>>> - if you run on a non-PC with two USB buses behind non-PCI > >>>>>> controllers. > >>>>>> > >>>>>> - device xyz. > >>>>>> > >>>>>>> If this applies to _resume_ only, then I agree, but the > >>>>>>> Arjan's data clearly > >>>>>>> show that serio devices take much more time to suspend than USB. > >>>>>> > >>>>>> I mean in general - something where you actually have hard data > >>>>>> that some > >>>>>> device really needs anythign more than my one-liner, and really > >>>>>> _needs_ > >>>>>> some complex infrastructure. > >>>>>> > >>>>>> Not "let's imagine a case like xyz". > >>>>> > >>>>> As I said I would, I made some measurements. > >>>>> > >>>>> I measured the total time of suspending and resuming devices as > >>>>> shown by the > >>>>> code added by this patch: > >>>>> http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=commitdiff_plain;h=c1b8fc0a8bff7707c10f31f3d26bfa88e18ccd94;hp=087dbf5f079f1b55cbd3964c9ce71268473d5b67 > >>>>> on two boxes, HP nx6325 and MSI Wind U100 (hardware-wise they > >>>>> are quite > >>>>> different and the HP was running 64-bit kernel and user space). > >>>>> > >>>>> I took four cases into consideration: > >>>>> (1) synchronous suspend and resume (/sys/power/pm_async = 0) > >>>>> (2) asynchronous suspend and resume as introduced by the async > >>>>> branch at: > >>>>> http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=shortlog;h=refs/heads/async > >>>>> (3) asynchronous suspend and resume like in (2), but with your > >>>>> one-liner setting > >>>>> the power.async_suspend flag for PCI bridges on top > >>>>> (4) asynchronous suspend and resume like in (2), but with an > >>>>> extra patch that > >>>>> is appended on top > >>>>> > >>>>> For those tests I set power.async_suspend for all USB devices, > >>>>> all serio input > >>>>> devices, the ACPI battery and the USB PCI controllers (to see > >>>>> the impact of the > >>>>> one-liner, if any). > >>>>> > >>>>> I carried out 5 consecutive suspend-resume cycles (started from > >>>>> under X) on > >>>>> each box in each case, and the raw data are here (all times in > >>>>> milliseconds): > >>>>> http://www.sisk.pl/kernel/data/async-suspend.pdf > >>>>> > >>>>> The summarized data are below (the "big" numbers are averages > >>>>> and the +/- > >>>>> numbers are standard deviations, all in milliseconds): > >>>>> > >>>>> HP nx6325 MSI Wind U100 > >>>>> > >>>>> sync suspend 1482 (+/- 40) 1180 (+/- 24) > >>>>> sync resume 2955 (+/- 2) 3597 (+/- 25) > >>>>> > >>>>> async suspend 1553 (+/- 49) 1177 (+/- 32) > >>>>> async resume 2692 (+/- 326) 3556 (+/- 33) > >>>>> > >>>>> async+one-liner suspend 1600 (+/- 39) 1212 (+/- 41) > >>>>> async+one-liner resume 2692 (+/- 324) 3579 (+/- 24) > >>>>> > >>>>> async+extra suspend 1496 (+/- 37) 1217 (+/- 38) > >>>>> async+extra resume 1859 (+/- 114) 1923 (+/- 35) > >>>>> > >>>>> So, in my opinion, with the above set of "async" devices, it > >>>>> doesn't > >>>>> make sense to do async suspend at all, because the sync suspend > >>>>> is actually > >>>>> the fastest on both machines. > >>>> > >>>> I think the async suspend is not asynchronous enough then - what > >>>> kind of > >>>> time do you get if you simply comment out call to psmouse_reset() > >>>> in > >>>> drivers/input/mouse/psmouse-base.c:psmouse_cleanup()? (Just for > >>>> testing > >>>> purposes only, I don't think we want to do that by default.) > >>> > >>> The problem apparently is that the i8042 suspend/resume is > >>> synchronous. > >>> > >>> Do you think it's safe to mark it as asynchronous? > >>> > >> > >> Umm.. there lie dragons. There is an implicit relationship between > >> i8042 > >> and PNP/ACPI devices representing keyboard and mouse ports, and I > >> am not > >> sure how happy i8042 (and most importantly the BIOS) will be if > >> they get > >> shut down before i8042. Also there is EC which is in theory > >> independent > >> but in practice not so much. > > > > I see. > > > > Is this possible to identify ACPI devices that should wait for the > > i8042 > > suspend and that should be waited for by it on resume? > > We could try to add some dependencies while discovering PNP to get KBC > addresses in i8042 but we need tomake sure we do it even in presence > of i8042.nopnp. Well, I guess this is the example of the off-tree dependencies that actually matter Linus wanted. :-) I guess there are quite a few devices that can depend on the i8042 in principle, is this correct? Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-19 23:10 ` Rafael J. Wysocki @ 2009-12-19 23:22 ` Dmitry Torokhov 2009-12-19 23:33 ` Rafael J. Wysocki 2009-12-19 23:23 ` Linus Torvalds 1 sibling, 1 reply; 235+ messages in thread From: Dmitry Torokhov @ 2009-12-19 23:22 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Dec 19, 2009, at 3:10 PM, "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > On Saturday 19 December 2009, Dmitry Torokhov wrote: >> On Dec 19, 2009, at 1:33 PM, "Rafael J. Wysocki" <rjw@sisk.pl> wrote: >> >>> On Saturday 19 December 2009, Dmitry Torokhov wrote: >>>> On Fri, Dec 18, 2009 at 11:43:29PM +0100, Rafael J. Wysocki wrote: >>>>> On Wednesday 16 December 2009, Dmitry Torokhov wrote: >>>>>> On Wed, Dec 16, 2009 at 03:11:05AM +0100, Rafael J. Wysocki >>>>>> wrote: >>>>>>> On Tuesday 15 December 2009, Linus Torvalds wrote: >>>>>>>> >>>>>>>> On Tue, 15 Dec 2009, Rafael J. Wysocki wrote: >>>>>>>>>> >>>>>>>>>> Give a real example that matters. >>>>>>>>> >>>>>>>>> I'll try. Let -> denote child-parent relationships and assume >>>>>>>>> dpm_list looks >>>>>>>>> like this: >>>>>>>> >>>>>>>> No. >>>>>>>> >>>>>>>> I mean something real - something like >>>>>>>> >>>>>>>> - if you run on a non-PC with two USB buses behind non-PCI >>>>>>>> controllers. >>>>>>>> >>>>>>>> - device xyz. >>>>>>>> >>>>>>>>> If this applies to _resume_ only, then I agree, but the >>>>>>>>> Arjan's data clearly >>>>>>>>> show that serio devices take much more time to suspend than >>>>>>>>> USB. >>>>>>>> >>>>>>>> I mean in general - something where you actually have hard data >>>>>>>> that some >>>>>>>> device really needs anythign more than my one-liner, and really >>>>>>>> _needs_ >>>>>>>> some complex infrastructure. >>>>>>>> >>>>>>>> Not "let's imagine a case like xyz". >>>>>>> >>>>>>> As I said I would, I made some measurements. >>>>>>> >>>>>>> I measured the total time of suspending and resuming devices as >>>>>>> shown by the >>>>>>> code added by this patch: >>>>>>> http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=commitdiff_plain;h=c1b8fc0a8bff7707c10f31f3d26bfa88e18ccd94;hp=087dbf5f079f1b55cbd3964c9ce71268473d5b67 >>>>>>> on two boxes, HP nx6325 and MSI Wind U100 (hardware-wise they >>>>>>> are quite >>>>>>> different and the HP was running 64-bit kernel and user space). >>>>>>> >>>>>>> I took four cases into consideration: >>>>>>> (1) synchronous suspend and resume (/sys/power/pm_async = 0) >>>>>>> (2) asynchronous suspend and resume as introduced by the async >>>>>>> branch at: >>>>>>> http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=shortlog;h=refs/heads/async >>>>>>> (3) asynchronous suspend and resume like in (2), but with your >>>>>>> one-liner setting >>>>>>> the power.async_suspend flag for PCI bridges on top >>>>>>> (4) asynchronous suspend and resume like in (2), but with an >>>>>>> extra patch that >>>>>>> is appended on top >>>>>>> >>>>>>> For those tests I set power.async_suspend for all USB devices, >>>>>>> all serio input >>>>>>> devices, the ACPI battery and the USB PCI controllers (to see >>>>>>> the impact of the >>>>>>> one-liner, if any). >>>>>>> >>>>>>> I carried out 5 consecutive suspend-resume cycles (started from >>>>>>> under X) on >>>>>>> each box in each case, and the raw data are here (all times in >>>>>>> milliseconds): >>>>>>> http://www.sisk.pl/kernel/data/async-suspend.pdf >>>>>>> >>>>>>> The summarized data are below (the "big" numbers are averages >>>>>>> and the +/- >>>>>>> numbers are standard deviations, all in milliseconds): >>>>>>> >>>>>>> HP nx6325 MSI Wind U100 >>>>>>> >>>>>>> sync suspend 1482 (+/- 40) 1180 (+/- 24) >>>>>>> sync resume 2955 (+/- 2) 3597 (+/- 25) >>>>>>> >>>>>>> async suspend 1553 (+/- 49) 1177 (+/- 32) >>>>>>> async resume 2692 (+/- 326) 3556 (+/- 33) >>>>>>> >>>>>>> async+one-liner suspend 1600 (+/- 39) 1212 (+/- 41) >>>>>>> async+one-liner resume 2692 (+/- 324) 3579 (+/- 24) >>>>>>> >>>>>>> async+extra suspend 1496 (+/- 37) 1217 (+/- 38) >>>>>>> async+extra resume 1859 (+/- 114) 1923 (+/- 35) >>>>>>> >>>>>>> So, in my opinion, with the above set of "async" devices, it >>>>>>> doesn't >>>>>>> make sense to do async suspend at all, because the sync suspend >>>>>>> is actually >>>>>>> the fastest on both machines. >>>>>> >>>>>> I think the async suspend is not asynchronous enough then - what >>>>>> kind of >>>>>> time do you get if you simply comment out call to psmouse_reset() >>>>>> in >>>>>> drivers/input/mouse/psmouse-base.c:psmouse_cleanup()? (Just for >>>>>> testing >>>>>> purposes only, I don't think we want to do that by default.) >>>>> >>>>> The problem apparently is that the i8042 suspend/resume is >>>>> synchronous. >>>>> >>>>> Do you think it's safe to mark it as asynchronous? >>>>> >>>> >>>> Umm.. there lie dragons. There is an implicit relationship between >>>> i8042 >>>> and PNP/ACPI devices representing keyboard and mouse ports, and I >>>> am not >>>> sure how happy i8042 (and most importantly the BIOS) will be if >>>> they get >>>> shut down before i8042. Also there is EC which is in theory >>>> independent >>>> but in practice not so much. >>> >>> I see. >>> >>> Is this possible to identify ACPI devices that should wait for the >>> i8042 >>> suspend and that should be waited for by it on resume? >> >> We could try to add some dependencies while discovering PNP to get >> KBC >> addresses in i8042 but we need tomake sure we do it even in presence >> of i8042.nopnp. > > Well, I guess this is the example of the off-tree dependencies that > actually > matter Linus wanted. :-) > > I guess there are quite a few devices that can depend on the i8042 in > principle, is this correct? The devices that depend on i8042 are serio ports that are it's children. I8042 itself may have indirect dependency on a couple of PNP devices. > I hope this answers your question... -- Dmitry ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-19 23:22 ` Dmitry Torokhov @ 2009-12-19 23:33 ` Rafael J. Wysocki 0 siblings, 0 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-19 23:33 UTC (permalink / raw) To: Dmitry Torokhov Cc: Linus Torvalds, Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Sunday 20 December 2009, Dmitry Torokhov wrote: > On Dec 19, 2009, at 3:10 PM, "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > > On Saturday 19 December 2009, Dmitry Torokhov wrote: > >> On Dec 19, 2009, at 1:33 PM, "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > >> > >>> On Saturday 19 December 2009, Dmitry Torokhov wrote: > >>>> On Fri, Dec 18, 2009 at 11:43:29PM +0100, Rafael J. Wysocki wrote: > >>>>> On Wednesday 16 December 2009, Dmitry Torokhov wrote: > >>>>>> On Wed, Dec 16, 2009 at 03:11:05AM +0100, Rafael J. Wysocki > >>>>>> wrote: > >>>>>>> On Tuesday 15 December 2009, Linus Torvalds wrote: > >>>>>>>> > >>>>>>>> On Tue, 15 Dec 2009, Rafael J. Wysocki wrote: > >>>>>>>>>> > >>>>>>>>>> Give a real example that matters. > >>>>>>>>> > >>>>>>>>> I'll try. Let -> denote child-parent relationships and assume > >>>>>>>>> dpm_list looks > >>>>>>>>> like this: > >>>>>>>> > >>>>>>>> No. > >>>>>>>> > >>>>>>>> I mean something real - something like > >>>>>>>> > >>>>>>>> - if you run on a non-PC with two USB buses behind non-PCI > >>>>>>>> controllers. > >>>>>>>> > >>>>>>>> - device xyz. > >>>>>>>> > >>>>>>>>> If this applies to _resume_ only, then I agree, but the > >>>>>>>>> Arjan's data clearly > >>>>>>>>> show that serio devices take much more time to suspend than > >>>>>>>>> USB. > >>>>>>>> > >>>>>>>> I mean in general - something where you actually have hard data > >>>>>>>> that some > >>>>>>>> device really needs anythign more than my one-liner, and really > >>>>>>>> _needs_ > >>>>>>>> some complex infrastructure. > >>>>>>>> > >>>>>>>> Not "let's imagine a case like xyz". > >>>>>>> > >>>>>>> As I said I would, I made some measurements. > >>>>>>> > >>>>>>> I measured the total time of suspending and resuming devices as > >>>>>>> shown by the > >>>>>>> code added by this patch: > >>>>>>> http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=commitdiff_plain;h=c1b8fc0a8bff7707c10f31f3d26bfa88e18ccd94;hp=087dbf5f079f1b55cbd3964c9ce71268473d5b67 > >>>>>>> on two boxes, HP nx6325 and MSI Wind U100 (hardware-wise they > >>>>>>> are quite > >>>>>>> different and the HP was running 64-bit kernel and user space). > >>>>>>> > >>>>>>> I took four cases into consideration: > >>>>>>> (1) synchronous suspend and resume (/sys/power/pm_async = 0) > >>>>>>> (2) asynchronous suspend and resume as introduced by the async > >>>>>>> branch at: > >>>>>>> http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=shortlog;h=refs/heads/async > >>>>>>> (3) asynchronous suspend and resume like in (2), but with your > >>>>>>> one-liner setting > >>>>>>> the power.async_suspend flag for PCI bridges on top > >>>>>>> (4) asynchronous suspend and resume like in (2), but with an > >>>>>>> extra patch that > >>>>>>> is appended on top > >>>>>>> > >>>>>>> For those tests I set power.async_suspend for all USB devices, > >>>>>>> all serio input > >>>>>>> devices, the ACPI battery and the USB PCI controllers (to see > >>>>>>> the impact of the > >>>>>>> one-liner, if any). > >>>>>>> > >>>>>>> I carried out 5 consecutive suspend-resume cycles (started from > >>>>>>> under X) on > >>>>>>> each box in each case, and the raw data are here (all times in > >>>>>>> milliseconds): > >>>>>>> http://www.sisk.pl/kernel/data/async-suspend.pdf > >>>>>>> > >>>>>>> The summarized data are below (the "big" numbers are averages > >>>>>>> and the +/- > >>>>>>> numbers are standard deviations, all in milliseconds): > >>>>>>> > >>>>>>> HP nx6325 MSI Wind U100 > >>>>>>> > >>>>>>> sync suspend 1482 (+/- 40) 1180 (+/- 24) > >>>>>>> sync resume 2955 (+/- 2) 3597 (+/- 25) > >>>>>>> > >>>>>>> async suspend 1553 (+/- 49) 1177 (+/- 32) > >>>>>>> async resume 2692 (+/- 326) 3556 (+/- 33) > >>>>>>> > >>>>>>> async+one-liner suspend 1600 (+/- 39) 1212 (+/- 41) > >>>>>>> async+one-liner resume 2692 (+/- 324) 3579 (+/- 24) > >>>>>>> > >>>>>>> async+extra suspend 1496 (+/- 37) 1217 (+/- 38) > >>>>>>> async+extra resume 1859 (+/- 114) 1923 (+/- 35) > >>>>>>> > >>>>>>> So, in my opinion, with the above set of "async" devices, it > >>>>>>> doesn't > >>>>>>> make sense to do async suspend at all, because the sync suspend > >>>>>>> is actually > >>>>>>> the fastest on both machines. > >>>>>> > >>>>>> I think the async suspend is not asynchronous enough then - what > >>>>>> kind of > >>>>>> time do you get if you simply comment out call to psmouse_reset() > >>>>>> in > >>>>>> drivers/input/mouse/psmouse-base.c:psmouse_cleanup()? (Just for > >>>>>> testing > >>>>>> purposes only, I don't think we want to do that by default.) > >>>>> > >>>>> The problem apparently is that the i8042 suspend/resume is > >>>>> synchronous. > >>>>> > >>>>> Do you think it's safe to mark it as asynchronous? > >>>>> > >>>> > >>>> Umm.. there lie dragons. There is an implicit relationship between > >>>> i8042 > >>>> and PNP/ACPI devices representing keyboard and mouse ports, and I > >>>> am not > >>>> sure how happy i8042 (and most importantly the BIOS) will be if > >>>> they get > >>>> shut down before i8042. Also there is EC which is in theory > >>>> independent > >>>> but in practice not so much. > >>> > >>> I see. > >>> > >>> Is this possible to identify ACPI devices that should wait for the > >>> i8042 > >>> suspend and that should be waited for by it on resume? > >> > >> We could try to add some dependencies while discovering PNP to get > >> KBC > >> addresses in i8042 but we need tomake sure we do it even in presence > >> of i8042.nopnp. > > > > Well, I guess this is the example of the off-tree dependencies that > > actually > > matter Linus wanted. :-) > > > > I guess there are quite a few devices that can depend on the i8042 in > > principle, is this correct? > > The devices that depend on i8042 are serio ports that are it's > children. That I already knew. :-) > I8042 itself may have indirect dependency on a couple of PNP devices. I was really asking about these. > I hope this answers your question... Yes, thanks. ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-19 23:10 ` Rafael J. Wysocki 2009-12-19 23:22 ` Dmitry Torokhov @ 2009-12-19 23:23 ` Linus Torvalds 2009-12-19 23:40 ` Rafael J. Wysocki 1 sibling, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-19 23:23 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Dmitry Torokhov, Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Sun, 20 Dec 2009, Rafael J. Wysocki wrote: > > Well, I guess this is the example of the off-tree dependencies that actually > matter Linus wanted. :-) It's also the kind of dependency where I say "if we get into these kinds of messes, then the whole async crap isn't worth it". Really. Having to try to match things up with ACPI and PnP is a nightmare. Especially since I doubt Windows does anything like this, which means that there's no reason for BIOS vendors to do the tables so that we'd even know. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-19 23:23 ` Linus Torvalds @ 2009-12-19 23:40 ` Rafael J. Wysocki 2009-12-19 23:46 ` Linus Torvalds 2009-12-20 3:59 ` Alan Stern 0 siblings, 2 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-19 23:40 UTC (permalink / raw) To: Linus Torvalds Cc: Dmitry Torokhov, Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Sunday 20 December 2009, Linus Torvalds wrote: > > On Sun, 20 Dec 2009, Rafael J. Wysocki wrote: > > > > Well, I guess this is the example of the off-tree dependencies that actually > > matter Linus wanted. :-) > > It's also the kind of dependency where I say "if we get into these kinds > of messes, then the whole async crap isn't worth it". > > Really. Having to try to match things up with ACPI and PnP is a nightmare. > Especially since I doubt Windows does anything like this, which means that > there's no reason for BIOS vendors to do the tables so that we'd even > know. OK, so this means we can just forget about suspending/resuming i8042 asynchronously, which is a pity, because that gave us some real suspend speedup on my test systems. Well, whatever. So, seriously, do you think it makes sense to do asynchronous suspend at all? I'm asking, because we're likely to get into troubles like this during suspend for other kinds of devices too and without resolving them we won't get any significant speedup from asynchronous suspend. That said, to me it's definitely worth doing asynchronous resume with the "start asynch threads upfront" modification, as the results of the tests show that quite clearly. I hope you agree. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-19 23:40 ` Rafael J. Wysocki @ 2009-12-19 23:46 ` Linus Torvalds 2009-12-19 23:47 ` Linus Torvalds 2009-12-19 23:53 ` Rafael J. Wysocki 2009-12-20 3:59 ` Alan Stern 1 sibling, 2 replies; 235+ messages in thread From: Linus Torvalds @ 2009-12-19 23:46 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Dmitry Torokhov, Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Sun, 20 Dec 2009, Rafael J. Wysocki wrote: > > OK, so this means we can just forget about suspending/resuming i8042 > asynchronously, which is a pity, because that gave us some real suspend > speedup on my test systems. No. What it means is that you shouldn't try to come up with these idiotic scenarios just trying to make trouble for yourself, and using it as an excuse for crap. I suggest you try to treat the i8042 controller async, and see if it is problematic. If it isn't, don't do that then. But we actually have no real reason to believe that it would be problematic, at least on a PC where the actual logic is on the SB (presumably behind the LPC controller). Why would it be? The fact that PnP and ACPI enumerates those devices has exactly _what_ to do with anything? Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-19 23:46 ` Linus Torvalds @ 2009-12-19 23:47 ` Linus Torvalds 2009-12-19 23:54 ` Rafael J. Wysocki 2009-12-19 23:53 ` Rafael J. Wysocki 1 sibling, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-19 23:47 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Dmitry Torokhov, Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Sat, 19 Dec 2009, Linus Torvalds wrote: > > I suggest you try to treat the i8042 controller async, and see if it is > problematic. If it isn't, don't do that then. I obviously meant: "If it _is_ problematic, don't do that then". "Is", not "isn't". Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-19 23:47 ` Linus Torvalds @ 2009-12-19 23:54 ` Rafael J. Wysocki 0 siblings, 0 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-19 23:54 UTC (permalink / raw) To: Linus Torvalds Cc: Dmitry Torokhov, Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Sunday 20 December 2009, Linus Torvalds wrote: > > On Sat, 19 Dec 2009, Linus Torvalds wrote: > > > > I suggest you try to treat the i8042 controller async, and see if it is > > problematic. If it isn't, don't do that then. > > I obviously meant: "If it _is_ problematic, don't do that then". "Is", not > "isn't". Sure, I understood that was a typo. :-) Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-19 23:46 ` Linus Torvalds 2009-12-19 23:47 ` Linus Torvalds @ 2009-12-19 23:53 ` Rafael J. Wysocki 2009-12-20 0:09 ` Linus Torvalds 2009-12-20 2:45 ` Async suspend-resume patch w/ completions (was: Re: Async suspend-resume " Dmitry Torokhov 1 sibling, 2 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-19 23:53 UTC (permalink / raw) To: Linus Torvalds Cc: Dmitry Torokhov, Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Sunday 20 December 2009, Linus Torvalds wrote: > > On Sun, 20 Dec 2009, Rafael J. Wysocki wrote: > > > > OK, so this means we can just forget about suspending/resuming i8042 > > asynchronously, which is a pity, because that gave us some real suspend > > speedup on my test systems. > > No. What it means is that you shouldn't try to come up with these idiotic > scenarios just trying to make trouble for yourself, I haven't. I've just asked Dmitry for his opinion and got it. The fact that you don't like it doesn't mean it's actually "idiotic". > and using it as an excuse for crap. I'm not sure what you mean exactly, but whatever. > I suggest you try to treat the i8042 controller async, and see if it is > problematic. I already have and I don't see problems with it, but quite obviously I can't test all possible configurations out there. > If it isn't, don't do that then. But we actually have no real > reason to believe that it would be problematic, at least on a PC where the > actual logic is on the SB (presumably behind the LPC controller). > > Why would it be? The embedded controller may depend on it. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-19 23:53 ` Rafael J. Wysocki @ 2009-12-20 0:09 ` Linus Torvalds 2009-12-20 0:35 ` Rafael J. Wysocki 2009-12-20 2:41 ` Dmitry Torokhov 2009-12-20 2:45 ` Async suspend-resume patch w/ completions (was: Re: Async suspend-resume " Dmitry Torokhov 1 sibling, 2 replies; 235+ messages in thread From: Linus Torvalds @ 2009-12-20 0:09 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Dmitry Torokhov, Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Sun, 20 Dec 2009, Rafael J. Wysocki wrote: > > > > Why would it be? > > The embedded controller may depend on it. Again, I say "why?" Anything can be true. That doesn't _make_ everything true. There's no real reason why PnP/ACPI suspend/resume should really care. We can try it. Not for 2.6.33, but by the 34 merge window maybe we'll have a patch-series that is ready to be tested, and that aggressively tries to do the devices that matter asynchronously. So instead of you trying to make up some idiotic cross-device worries, just see if those worries have any actual background in reality. So far I haven't actually heard anything but "in theory, anything is possible", which is such a truism that it's not even worth voicing. That said, I still get the feeling that we'd be even better off simply trying to avoid the whole keyboard reset entirely. Apparently we do it for a few HP laptops. It's entirely possible that we'd be better off simply not _doing_ the slow thing in the first place. For example, we may be _much_ better off doing that whole keyboard reset at resume time than at suspend time. That's what we do when we probe things on initialization - and the resume-time keyboard code is actually already asynchronous, it does that atkbd_reconnect asynchronously by queuing it as an event. So again, all these problems may not at all be fundamnetal problems: the keyboard driver does certain things, but there is no guarantee that it _needs_ to do those things. Turning the driver async may be totally the wrong thing to do, when we could potentially fix latency problems at the driver level instead. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-20 0:09 ` Linus Torvalds @ 2009-12-20 0:35 ` Rafael J. Wysocki 2009-12-20 2:41 ` Dmitry Torokhov 1 sibling, 0 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-20 0:35 UTC (permalink / raw) To: Linus Torvalds Cc: Dmitry Torokhov, Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Sunday 20 December 2009, Linus Torvalds wrote: > > On Sun, 20 Dec 2009, Rafael J. Wysocki wrote: > > > > > > Why would it be? > > > > The embedded controller may depend on it. > > Again, I say "why?" > > Anything can be true. That doesn't _make_ everything true. There's no real > reason why PnP/ACPI suspend/resume should really care. > > We can try it. Not for 2.6.33, but by the 34 merge window maybe we'll have > a patch-series that is ready to be tested, and that aggressively tries to > do the devices that matter asynchronously. Yes, I'd like to have such a patch series for 2.6.34. So far I've been able to confirm that doing serio+i8042, USB and ACPI battery asynchronously may give us significant time savings, especially during resume. > So instead of you trying to make up some idiotic cross-device worries, > just see if those worries have any actual background in reality. So far I > haven't actually heard anything but "in theory, anything is possible", > which is such a truism that it's not even worth voicing. > > That said, I still get the feeling that we'd be even better off simply > trying to avoid the whole keyboard reset entirely. Apparently we do it for > a few HP laptops. It's entirely possible that we'd be better off simply > not _doing_ the slow thing in the first place. That very well may be the case, but I'm not the right person to confirm or deny that. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-20 0:09 ` Linus Torvalds 2009-12-20 0:35 ` Rafael J. Wysocki @ 2009-12-20 2:41 ` Dmitry Torokhov 2009-12-20 19:25 ` [linux-pm] " Rafael J. Wysocki 1 sibling, 1 reply; 235+ messages in thread From: Dmitry Torokhov @ 2009-12-20 2:41 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list, Vojtech Pavlik On Sat, Dec 19, 2009 at 04:09:07PM -0800, Linus Torvalds wrote: > > That said, I still get the feeling that we'd be even better off simply > trying to avoid the whole keyboard reset entirely. Apparently we do it for > a few HP laptops. I was mistaken, HP laptops do not like mouse disabled when suspending, not sure about the rest of the state. > It's entirely possible that we'd be better off simply > not _doing_ the slow thing in the first place. > The reset appeared first in 2.5.42. I expect that some BIOSes get very confused when tehy find mouse speaking something that they do not unserstand (i.e. synaptics, ALPS or anything else that is not bare PS/2 or intellimouse), but maybe Vojtech remembers better? > For example, we may be _much_ better off doing that whole keyboard reset > at resume time than at suspend time. We do the reset for the different reasons - at resume we want the device in known state to ensure that it properly responds to the probes we send to it. At suspend we trying to reset things into original state so that the firmware will not be confused. If we want to try to live without reset we could to PSMOUSE_CMD_RESET_DIS instead of PSMOUSE_CMD_RESET_BAT which is much heavier. We should probably not wait for .34 then because the bulk of testing will happen only when .33 is close to be released because that's when most of regular users will start using the new code and try to suspend and resume. Rafael, how long does suspend take if you change call to psmouse_reset() in psmouse_cleanup() to ps2_command(&psmouse->ps2dev, NULL, PSMOUSE_CMD_RESET_DIS)? And do the same for atkbd... BTW, making just serio asynchronous while keeping i8042 synchronous makes no sense because I serialize access to i8042 - the thing does not survive simultaneous [command] access to both keyboard and mouse... -- Dmitry ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [linux-pm] Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-20 2:41 ` Dmitry Torokhov @ 2009-12-20 19:25 ` Rafael J. Wysocki 2009-12-21 7:39 ` [linux-pm] Async suspend-resume patch w/ completions (was: Re: Async?suspend-resume " Dmitry Torokhov 0 siblings, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-20 19:25 UTC (permalink / raw) To: linux-pm Cc: Dmitry Torokhov, Linus Torvalds, LKML, ACPI Devel Maling List, Vojtech Pavlik On Sunday 20 December 2009, Dmitry Torokhov wrote: > On Sat, Dec 19, 2009 at 04:09:07PM -0800, Linus Torvalds wrote: > > > > That said, I still get the feeling that we'd be even better off simply > > trying to avoid the whole keyboard reset entirely. Apparently we do it for > > a few HP laptops. > > I was mistaken, HP laptops do not like mouse disabled when suspending, > not sure about the rest of the state. > > > It's entirely possible that we'd be better off simply > > not _doing_ the slow thing in the first place. > > > > The reset appeared first in 2.5.42. I expect that some BIOSes get very > confused when tehy find mouse speaking something that they do not > unserstand (i.e. synaptics, ALPS or anything else that is not bare PS/2 > or intellimouse), but maybe Vojtech remembers better? > > > For example, we may be _much_ better off doing that whole keyboard reset > > at resume time than at suspend time. > > We do the reset for the different reasons - at resume we want the device > in known state to ensure that it properly responds to the probes we > send to it. At suspend we trying to reset things into original state so > that the firmware will not be confused. > > If we want to try to live without reset we could to PSMOUSE_CMD_RESET_DIS > instead of PSMOUSE_CMD_RESET_BAT which is much heavier. We should > probably not wait for .34 then because the bulk of testing will happen > only when .33 is close to be released because that's when most of > regular users will start using the new code and try to suspend and > resume. > > Rafael, how long does suspend take if you change call to psmouse_reset() > in psmouse_cleanup() to ps2_command(&psmouse->ps2dev, NULL, PSMOUSE_CMD_RESET_DIS)? > And do the same for atkbd... On the nx6325 that appears to reduce the suspend time as much so the effect of async is not visible any more. On the Wind it decreases the total suspend time almost by half! Please push this patch to Linus. :-) > BTW, making just serio asynchronous while keeping i8042 synchronous > makes no sense because I serialize access to i8042 - the thing does not > survive simultaneous [command] access to both keyboard and mouse... OK Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [linux-pm] Async suspend-resume patch w/ completions (was: Re: Async?suspend-resume patch w/ rwsems) 2009-12-20 19:25 ` [linux-pm] " Rafael J. Wysocki @ 2009-12-21 7:39 ` Dmitry Torokhov 2009-12-21 11:20 ` Vojtech Pavlik 0 siblings, 1 reply; 235+ messages in thread From: Dmitry Torokhov @ 2009-12-21 7:39 UTC (permalink / raw) To: Rafael J. Wysocki Cc: linux-pm, Linus Torvalds, LKML, ACPI Devel Maling List, Vojtech Pavlik On Sun, Dec 20, 2009 at 08:25:25PM +0100, Rafael J. Wysocki wrote: > On Sunday 20 December 2009, Dmitry Torokhov wrote: > > On Sat, Dec 19, 2009 at 04:09:07PM -0800, Linus Torvalds wrote: > > > > > > That said, I still get the feeling that we'd be even better off simply > > > trying to avoid the whole keyboard reset entirely. Apparently we do it for > > > a few HP laptops. > > > > I was mistaken, HP laptops do not like mouse disabled when suspending, > > not sure about the rest of the state. > > > > > It's entirely possible that we'd be better off simply > > > not _doing_ the slow thing in the first place. > > > > > > > The reset appeared first in 2.5.42. I expect that some BIOSes get very > > confused when tehy find mouse speaking something that they do not > > unserstand (i.e. synaptics, ALPS or anything else that is not bare PS/2 > > or intellimouse), but maybe Vojtech remembers better? > > > > > For example, we may be _much_ better off doing that whole keyboard reset > > > at resume time than at suspend time. > > > > We do the reset for the different reasons - at resume we want the device > > in known state to ensure that it properly responds to the probes we > > send to it. At suspend we trying to reset things into original state so > > that the firmware will not be confused. > > > > If we want to try to live without reset we could to PSMOUSE_CMD_RESET_DIS > > instead of PSMOUSE_CMD_RESET_BAT which is much heavier. We should > > probably not wait for .34 then because the bulk of testing will happen > > only when .33 is close to be released because that's when most of > > regular users will start using the new code and try to suspend and > > resume. > > > > Rafael, how long does suspend take if you change call to psmouse_reset() > > in psmouse_cleanup() to ps2_command(&psmouse->ps2dev, NULL, PSMOUSE_CMD_RESET_DIS)? > > And do the same for atkbd... > > On the nx6325 that appears to reduce the suspend time as much so the effect > of async is not visible any more. On the Wind it decreases the total suspend > time almost by half! > > Please push this patch to Linus. :-) > Let's see if I manage to solicit some testers first. FWIW it seems to be working on my boxes. But if this works then I am not sure we even want to bother with async suspend of i8042 and serios. And serio already does resume asynchronously through kseriod. -- Dmitry ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [linux-pm] Async suspend-resume patch w/ completions (was: Re: Async?suspend-resume patch w/ rwsems) 2009-12-21 7:39 ` [linux-pm] Async suspend-resume patch w/ completions (was: Re: Async?suspend-resume " Dmitry Torokhov @ 2009-12-21 11:20 ` Vojtech Pavlik 0 siblings, 0 replies; 235+ messages in thread From: Vojtech Pavlik @ 2009-12-21 11:20 UTC (permalink / raw) To: Dmitry Torokhov Cc: Rafael J. Wysocki, linux-pm, Linus Torvalds, LKML, ACPI Devel Maling List On Sun, Dec 20, 2009 at 11:39:15PM -0800, Dmitry Torokhov wrote: > > On the nx6325 that appears to reduce the suspend time as much so the effect > > of async is not visible any more. On the Wind it decreases the total suspend > > time almost by half! > > > > Please push this patch to Linus. :-) > > > > Let's see if I manage to solicit some testers first. FWIW it seems to be > working on my boxes. > > But if this works then I am not sure we even want to bother with async > suspend of i8042 and serios. And serio already does resume > asynchronously through kseriod. I'm kind of wondering where this will break, but I don't remember why the RESET_BAT was put in exactly - the point of making sure the BIOS doesn't get confused by the advanced modes is correct, and is required at least when a keyboard is set to "Set 3", but RESET_BAT is a too heavy hammer anyway - we could just make sure to switch the kbd/mouse to 'default' modes instead of doing a full reset. -- Vojtech Pavlik Director SuSE Labs ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-19 23:53 ` Rafael J. Wysocki 2009-12-20 0:09 ` Linus Torvalds @ 2009-12-20 2:45 ` Dmitry Torokhov 1 sibling, 0 replies; 235+ messages in thread From: Dmitry Torokhov @ 2009-12-20 2:45 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Sun, Dec 20, 2009 at 12:53:45AM +0100, Rafael J. Wysocki wrote: > On Sunday 20 December 2009, Linus Torvalds wrote: > > > > If it isn't, don't do that then. But we actually have no real > > reason to believe that it would be problematic, at least on a PC where the > > actual logic is on the SB (presumably behind the LPC controller). > > > > Why would it be? > > The embedded controller may depend on it. > No, not really depend but rather wierd things may happen if you accessing both. Witness regressions where touching embedded controller makes us lose data from touchpad, I think you are CCed on that bug. -- Dmitry ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-19 23:40 ` Rafael J. Wysocki 2009-12-19 23:46 ` Linus Torvalds @ 2009-12-20 3:59 ` Alan Stern 2009-12-20 12:52 ` Rafael J. Wysocki 1 sibling, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-20 3:59 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Dmitry Torokhov, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Sun, 20 Dec 2009, Rafael J. Wysocki wrote: > So, seriously, do you think it makes sense to do asynchronous suspend at all? > I'm asking, because we're likely to get into troubles like this during suspend > for other kinds of devices too and without resolving them we won't get any > significant speedup from asynchronous suspend. > > That said, to me it's definitely worth doing asynchronous resume with the > "start asynch threads upfront" modification, as the results of the tests show > that quite clearly. I hope you agree. It's too early to come to this sort of conclusion (i.e., that suspend and resume react very differently to an asynchronous approach). Unless you have some definite _reason_ for thinking that resume will benefit more than suspend, you shouldn't try to generalize so much from tests on only two systems. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-20 3:59 ` Alan Stern @ 2009-12-20 12:52 ` Rafael J. Wysocki 2009-12-20 17:12 ` Alan Stern 0 siblings, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-20 12:52 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Dmitry Torokhov, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Sunday 20 December 2009, Alan Stern wrote: > On Sun, 20 Dec 2009, Rafael J. Wysocki wrote: > > > So, seriously, do you think it makes sense to do asynchronous suspend at all? > > I'm asking, because we're likely to get into troubles like this during suspend > > for other kinds of devices too and without resolving them we won't get any > > significant speedup from asynchronous suspend. > > > > That said, to me it's definitely worth doing asynchronous resume with the > > "start asynch threads upfront" modification, as the results of the tests show > > that quite clearly. I hope you agree. > > It's too early to come to this sort of conclusion (i.e., that suspend > and resume react very differently to an asynchronous approach). Unless > you have some definite _reason_ for thinking that resume will benefit > more than suspend, you shouldn't try to generalize so much from tests > on only two systems. In fact I have one reason. Namely, the things that drivers do on suspend and resume are evidently quite different and on these two systems I was able to test they apparently took different amounts of time to complete. The very fact that on both systems resume is substantially longer than suspend, even if all devices are suspended and resumed synchronously, is quite interesting. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-20 12:52 ` Rafael J. Wysocki @ 2009-12-20 17:12 ` Alan Stern 2009-12-20 18:10 ` Rafael J. Wysocki 0 siblings, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-20 17:12 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Dmitry Torokhov, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Sun, 20 Dec 2009, Rafael J. Wysocki wrote: > > It's too early to come to this sort of conclusion (i.e., that suspend > > and resume react very differently to an asynchronous approach). Unless > > you have some definite _reason_ for thinking that resume will benefit > > more than suspend, you shouldn't try to generalize so much from tests > > on only two systems. > > In fact I have one reason. Namely, the things that drivers do on suspend and > resume are evidently quite different and on these two systems I was able to > test they apparently took different amounts of time to complete. > > The very fact that on both systems resume is substantially longer than suspend, > even if all devices are suspended and resumed synchronously, is quite > interesting. Yes, it is. But it doesn't mean that suspend won't benefit from asynchronicity; it just means that the benefits might not be as large as they are for resume. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-20 17:12 ` Alan Stern @ 2009-12-20 18:10 ` Rafael J. Wysocki 2009-12-20 19:38 ` Alan Stern 0 siblings, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-20 18:10 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Dmitry Torokhov, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Sunday 20 December 2009, Alan Stern wrote: > On Sun, 20 Dec 2009, Rafael J. Wysocki wrote: > > > > It's too early to come to this sort of conclusion (i.e., that suspend > > > and resume react very differently to an asynchronous approach). Unless > > > you have some definite _reason_ for thinking that resume will benefit > > > more than suspend, you shouldn't try to generalize so much from tests > > > on only two systems. > > > > In fact I have one reason. Namely, the things that drivers do on suspend and > > resume are evidently quite different and on these two systems I was able to > > test they apparently took different amounts of time to complete. > > > > The very fact that on both systems resume is substantially longer than suspend, > > even if all devices are suspended and resumed synchronously, is quite > > interesting. > > Yes, it is. But it doesn't mean that suspend won't benefit from > asynchronicity; it just means that the benefits might not be as large > as they are for resume. Agreed, although that rises the question whether they are sufficiently significant. I guess time will tell. With the i8042 done asynchronously they are IMO. BTW, what's the right place to call device_enable_async_suspend() for USB devices? Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-20 18:10 ` Rafael J. Wysocki @ 2009-12-20 19:38 ` Alan Stern 2009-12-20 19:51 ` Rafael J. Wysocki 0 siblings, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-20 19:38 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Dmitry Torokhov, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Sun, 20 Dec 2009, Rafael J. Wysocki wrote: > BTW, what's the right place to call device_enable_async_suspend() for USB > devices? For USB devices, it's in drivers/usb/core/hub.c:usb_new_device() anywhere before the call to usb_device_add(). For USB interfaces, it's in drivers/usb/core/message.c:usb_set_configuration() before the call to device_add(). For USB endpoints, it's in drivers/usb/core/endpoint.c:usb_create_ep_devs() before the call to device_register(). However you won't need to do it for interfaces and endpoints if you automatically treat as async any device without suspend/resume callbacks. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-20 19:38 ` Alan Stern @ 2009-12-20 19:51 ` Rafael J. Wysocki 0 siblings, 0 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-20 19:51 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Dmitry Torokhov, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Sunday 20 December 2009, Alan Stern wrote: > On Sun, 20 Dec 2009, Rafael J. Wysocki wrote: > > > BTW, what's the right place to call device_enable_async_suspend() for USB > > devices? > > For USB devices, it's in drivers/usb/core/hub.c:usb_new_device() > anywhere before the call to usb_device_add(). > > For USB interfaces, it's in > drivers/usb/core/message.c:usb_set_configuration() before the call to > device_add(). > > For USB endpoints, it's in > drivers/usb/core/endpoint.c:usb_create_ep_devs() before the call to > device_register(). Thanks! > However you won't need to do it for interfaces and endpoints if you > automatically treat as async any device without suspend/resume > callbacks. I don't do that right now and I need these settings just for testing at the moment. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-16 2:11 ` Rafael J. Wysocki 2009-12-16 6:40 ` Dmitry Torokhov @ 2009-12-16 15:22 ` Alan Stern 2009-12-16 19:26 ` Rafael J. Wysocki 2009-12-16 15:47 ` Linus Torvalds 2 siblings, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-16 15:22 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wed, 16 Dec 2009, Rafael J. Wysocki wrote: > I measured the total time of suspending and resuming devices as shown by the > code added by this patch: > http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=commitdiff_plain;h=c1b8fc0a8bff7707c10f31f3d26bfa88e18ccd94;hp=087dbf5f079f1b55cbd3964c9ce71268473d5b67 > on two boxes, HP nx6325 and MSI Wind U100 (hardware-wise they are quite > different and the HP was running 64-bit kernel and user space). > I carried out 5 consecutive suspend-resume cycles (started from under X) on > each box in each case, and the raw data are here (all times in milliseconds): > http://www.sisk.pl/kernel/data/async-suspend.pdf I'd like to see much more detailed data. For each device, let's get the device name, the parent's name, and the start time, end time, and duration for suspend or resume. The start time should be measured when you have finished waiting for the children. The end time should be measured just before the complete_all(). Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-16 15:22 ` Alan Stern @ 2009-12-16 19:26 ` Rafael J. Wysocki 0 siblings, 0 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-16 19:26 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wednesday 16 December 2009, Alan Stern wrote: > On Wed, 16 Dec 2009, Rafael J. Wysocki wrote: > > > I measured the total time of suspending and resuming devices as shown by the > > code added by this patch: > > http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=commitdiff_plain;h=c1b8fc0a8bff7707c10f31f3d26bfa88e18ccd94;hp=087dbf5f079f1b55cbd3964c9ce71268473d5b67 > > on two boxes, HP nx6325 and MSI Wind U100 (hardware-wise they are quite > > different and the HP was running 64-bit kernel and user space). > > > I carried out 5 consecutive suspend-resume cycles (started from under X) on > > each box in each case, and the raw data are here (all times in milliseconds): > > http://www.sisk.pl/kernel/data/async-suspend.pdf > > I'd like to see much more detailed data. For each device, let's get > the device name, the parent's name, and the start time, end time, and > duration for suspend or resume. The start time should be measured when > you have finished waiting for the children. The end time should be > measured just before the complete_all(). I'm going to use the Arjan's patch + script to chart the suspend/resume times for individual devices. I can send you the raw data, though. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-16 2:11 ` Rafael J. Wysocki 2009-12-16 6:40 ` Dmitry Torokhov 2009-12-16 15:22 ` Alan Stern @ 2009-12-16 15:47 ` Linus Torvalds 2009-12-16 19:27 ` Rafael J. Wysocki 2 siblings, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-16 15:47 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wed, 16 Dec 2009, Rafael J. Wysocki wrote: > > The summarized data are below (the "big" numbers are averages and the +/- > numbers are standard deviations, all in milliseconds): > > HP nx6325 MSI Wind U100 > > sync suspend 1482 (+/- 40) 1180 (+/- 24) > sync resume 2955 (+/- 2) 3597 (+/- 25) > > async suspend 1553 (+/- 49) 1177 (+/- 32) > async resume 2692 (+/- 326) 3556 (+/- 33) > > async+one-liner suspend 1600 (+/- 39) 1212 (+/- 41) > async+one-liner resume 2692 (+/- 324) 3579 (+/- 24) > > async+extra suspend 1496 (+/- 37) 1217 (+/- 38) > async+extra resume 1859 (+/- 114) 1923 (+/- 35) > > So, in my opinion, with the above set of "async" devices, it doesn't > make sense to do async suspend at all, because the sync suspend is actually > the fastest on both machines. Hmm. I certainly agree - your numbers do not seem to support any async at all. However, I do note that for the "extra patch" makes a big difference at resume time. That implies that the resume serializes on some slow device that wasn't marked async - and starting the async ones early avoids that. But without the per-device timings, it's hard to even guess what device that was. But even that doesn't really help the suspend cases, only resume. Do you have any sample timing output with devices listed? Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-16 15:47 ` Linus Torvalds @ 2009-12-16 19:27 ` Rafael J. Wysocki 2009-12-16 20:59 ` Linus Torvalds 0 siblings, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-16 19:27 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wednesday 16 December 2009, Linus Torvalds wrote: > > On Wed, 16 Dec 2009, Rafael J. Wysocki wrote: > > > > The summarized data are below (the "big" numbers are averages and the +/- > > numbers are standard deviations, all in milliseconds): > > > > HP nx6325 MSI Wind U100 > > > > sync suspend 1482 (+/- 40) 1180 (+/- 24) > > sync resume 2955 (+/- 2) 3597 (+/- 25) > > > > async suspend 1553 (+/- 49) 1177 (+/- 32) > > async resume 2692 (+/- 326) 3556 (+/- 33) > > > > async+one-liner suspend 1600 (+/- 39) 1212 (+/- 41) > > async+one-liner resume 2692 (+/- 324) 3579 (+/- 24) > > > > async+extra suspend 1496 (+/- 37) 1217 (+/- 38) > > async+extra resume 1859 (+/- 114) 1923 (+/- 35) > > > > So, in my opinion, with the above set of "async" devices, it doesn't > > make sense to do async suspend at all, because the sync suspend is actually > > the fastest on both machines. > > Hmm. I certainly agree - your numbers do not seem to support any async at > all. > > However, I do note that for the "extra patch" makes a big difference at > resume time. That implies that the resume serializes on some slow device > that wasn't marked async - and starting the async ones early avoids that. > > But without the per-device timings, it's hard to even guess what device > that was. > > But even that doesn't really help the suspend cases, only resume. > > Do you have any sample timing output with devices listed? I'm going to generate one shortly. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-16 19:27 ` Rafael J. Wysocki @ 2009-12-16 20:59 ` Linus Torvalds 2009-12-16 21:57 ` Rafael J. Wysocki 0 siblings, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-16 20:59 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wed, 16 Dec 2009, Rafael J. Wysocki wrote: > > > > Do you have any sample timing output with devices listed? > > I'm going to generate one shortly. >From my bootup timings, I have this memory of SATA link bringup being noticeable. I wonder if that is the case on resume too... Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-16 20:59 ` Linus Torvalds @ 2009-12-16 21:57 ` Rafael J. Wysocki 2009-12-16 22:11 ` Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-16 21:57 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wednesday 16 December 2009, Linus Torvalds wrote: > > On Wed, 16 Dec 2009, Rafael J. Wysocki wrote: > > > > > > Do you have any sample timing output with devices listed? > > > > I'm going to generate one shortly. I've just put the first set of data, for the HP nx6325 at: http://www.sisk.pl/kernel/data/nx6325/ The *-dmesg.log files contain full dmesg outputs starting from a cold boot and including one suspend-resume cycle in each case, with debug_initcall enabled. The *-suspend.log files are excerpts from the *-dmesg.log files containing the suspend messages only, and analogously for *-resume.log. The *-times.txt files contain suspend/resume time for every device sorted in the decreasing order. > From my bootup timings, I have this memory of SATA link bringup being > noticeable. I wonder if that is the case on resume too... There's no SATA in the nx6325, only IDE, so we'd need to wait for the Wind data (in the works). The slowest suspending device in the nx6325 is the audio chip (surprise, surprise), it takes ~220 ms alone. Then - serio, but since i8042 was not async, the async suspend of serio didn't really help (another ~140 ms). Then network, FireWire, MMC, USB, SD host (~15 ms each). [I think we can help suspend a bit by making i8042 async, although I'm not sure that's going to be safe.] The slowest resuming are USB (by far) and then CardBus, audio, USB controllers, FireWire, network and IDE (but that only takes about 7 ms). But the main problem with async resume is that the USB devices are at the beginning of dpm_list, so the resume of them is not even started until _all_ of the slow devices behind them are woken up. That's why the extra patch helps so much IMO. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-16 21:57 ` Rafael J. Wysocki @ 2009-12-16 22:11 ` Linus Torvalds 2009-12-16 22:33 ` Rafael J. Wysocki 2009-12-16 23:04 ` Alan Stern 2009-12-17 1:49 ` Rafael J. Wysocki 2 siblings, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-16 22:11 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list Btw, what are the timings if you just force everything async? I think that worked on yur laptops, no? It would be interestign to know - if only to see what the asymptotic upper bound is for all of this is.. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-16 22:11 ` Linus Torvalds @ 2009-12-16 22:33 ` Rafael J. Wysocki 0 siblings, 0 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-16 22:33 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wednesday 16 December 2009, Linus Torvalds wrote: > > Btw, what are the timings if you just force everything async? I think that > worked on yur laptops, no? No, it didn't. I could make all PCI async, provided that the ACPI subtree was resumed before any PCI devices. [Theoretically I can make that happen by moving ACPI resume to the _noirq phase (just for testing of course). So I can try to make PCI async in addition to serio and USB, plus i8042 perhaps, which should be sfficient for the nx6325 I think.] Making all async always hanged the boxes on resume. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-16 21:57 ` Rafael J. Wysocki 2009-12-16 22:11 ` Linus Torvalds @ 2009-12-16 23:04 ` Alan Stern 2009-12-16 23:18 ` Rafael J. Wysocki 2009-12-17 1:49 ` Rafael J. Wysocki 2 siblings, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-16 23:04 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wed, 16 Dec 2009, Rafael J. Wysocki wrote: > I've just put the first set of data, for the HP nx6325 at: > http://www.sisk.pl/kernel/data/nx6325/ > > The *-dmesg.log files contain full dmesg outputs starting from a cold boot and > including one suspend-resume cycle in each case, with debug_initcall enabled. > > The *-suspend.log files are excerpts from the *-dmesg.log files containing > the suspend messages only, and analogously for *-resume.log. I've just started looking at the sync-suspend.log file. What are all the '+' characters and " @ 3368" strings after the device names? You didn't print out the parent name for each device, so the tree structure has been lost. Why do those "sd 0:0:0:0 [sda]" messages appear in between two callbacks? The cache-synchronization and the spin-down commands are not executed asynchronously. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-16 23:04 ` Alan Stern @ 2009-12-16 23:18 ` Rafael J. Wysocki 2009-12-17 1:30 ` Rafael J. Wysocki 0 siblings, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-16 23:18 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Thursday 17 December 2009, Alan Stern wrote: > On Wed, 16 Dec 2009, Rafael J. Wysocki wrote: > > > I've just put the first set of data, for the HP nx6325 at: > > http://www.sisk.pl/kernel/data/nx6325/ > > > > The *-dmesg.log files contain full dmesg outputs starting from a cold boot and > > including one suspend-resume cycle in each case, with debug_initcall enabled. > > > > The *-suspend.log files are excerpts from the *-dmesg.log files containing > > the suspend messages only, and analogously for *-resume.log. > > I've just started looking at the sync-suspend.log file. What are all > the '+' characters and " @ 3368" strings after the device names? I think the + is necessary for the Arjan's graph-generating script and the @ number is the value of current (ie. the PID of the calling task). > You didn't print out the parent name for each device, so the tree > structure has been lost. That's because the original Arjan's patch doesn't do that, I'm adding it right now. > Why do those "sd 0:0:0:0 [sda]" messages appear in between two > callbacks? The cache-synchronization and the spin-down commands are > not executed asynchronously. Because the data are incomplete. :-( I've just realized that the Arjan's patch only covers bus types and classes that have been converted to dev_pm_ops already, so I'm extending it to the "legacy" ones at the moment. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-16 23:18 ` Rafael J. Wysocki @ 2009-12-17 1:30 ` Rafael J. Wysocki 0 siblings, 0 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-17 1:30 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Thursday 17 December 2009, Rafael J. Wysocki wrote: > On Thursday 17 December 2009, Alan Stern wrote: > > On Wed, 16 Dec 2009, Rafael J. Wysocki wrote: > > > > > I've just put the first set of data, for the HP nx6325 at: > > > http://www.sisk.pl/kernel/data/nx6325/ > > > > > > The *-dmesg.log files contain full dmesg outputs starting from a cold boot and > > > including one suspend-resume cycle in each case, with debug_initcall enabled. > > > > > > The *-suspend.log files are excerpts from the *-dmesg.log files containing > > > the suspend messages only, and analogously for *-resume.log. > > > > I've just started looking at the sync-suspend.log file. What are all > > the '+' characters and " @ 3368" strings after the device names? > > I think the + is necessary for the Arjan's graph-generating script and the > @ number is the value of current (ie. the PID of the calling task). > > > You didn't print out the parent name for each device, so the tree > > structure has been lost. > > That's because the original Arjan's patch doesn't do that, I'm adding it > right now. > > > Why do those "sd 0:0:0:0 [sda]" messages appear in between two > > callbacks? The cache-synchronization and the spin-down commands are > > not executed asynchronously. > > Because the data are incomplete. :-( > > I've just realized that the Arjan's patch only covers bus types and classes > that have been converted to dev_pm_ops already, so I'm extending it to the > "legacy" ones at the moment. New data files have been uploaded to: http://www.sisk.pl/kernel/data/nx6325/ http://www.sisk.pl/kernel/data/wind/ Please let me know if you need more information. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-16 21:57 ` Rafael J. Wysocki 2009-12-16 22:11 ` Linus Torvalds 2009-12-16 23:04 ` Alan Stern @ 2009-12-17 1:49 ` Rafael J. Wysocki 2009-12-17 20:06 ` Alan Stern 2009-12-18 1:51 ` Rafael J. Wysocki 2 siblings, 2 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-17 1:49 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Wednesday 16 December 2009, Rafael J. Wysocki wrote: > On Wednesday 16 December 2009, Linus Torvalds wrote: > > > > On Wed, 16 Dec 2009, Rafael J. Wysocki wrote: > > > > > > > > Do you have any sample timing output with devices listed? > > > > > > I'm going to generate one shortly. > > I've just put the first set of data, for the HP nx6325 at: > http://www.sisk.pl/kernel/data/nx6325/ As I said in a message to Alan, the data were incomplete, because the original Arjan's patch only covers bus types and device classes converted to dev_pm_ops, which I only noticed earlier today. So I added the appended patch on top of the async tree and I applied a one-liner adding the name of the parent to each device line during (regular) suspend and resume. The new data sets are at: http://www.sisk.pl/kernel/data/nx6325/ http://www.sisk.pl/kernel/data/wind/ and the format is the same as described below. > The *-dmesg.log files contain full dmesg outputs starting from a cold boot and > including one suspend-resume cycle in each case, with debug_initcall enabled. > > The *-suspend.log files are excerpts from the *-dmesg.log files containing > the suspend messages only, and analogously for *-resume.log. > > The *-times.txt files contain suspend/resume time for every device sorted > in the decreasing order. > > > From my bootup timings, I have this memory of SATA link bringup being > > noticeable. I wonder if that is the case on resume too... That actually is correct. On the nx6325 suspend is totally dominated by disk spindown, almost everything else is negligible compared to it (well, except for the audio), so we can't go down below 1 s during suspend on this box. On the Wind, disk spindown time is comparable with serio suspend time, so at least in principle we should be able to get .5 s suspend on this box - if the disk spindown in async. In turn, the resume on the Wind is dominated by disk spinup, so we can't go below 1.5 s on this box during resume (notice that the "async+extra" approach brings us close to this limit, although we could save .5 s more in principle by making more devices async). Resume on the nx6325 is a different story, though, as it is dominated by USB and PCI devices, so marking those as async would probably bring us close to the limit. [Surprisingly enough to me some ACPI devices appear to take quite noticeable amounts of time to resume on both boxes.] Tomorrow I'll try to mark as many devices as reasonably possible as async and see how the total suspend-resume times change. Rafael --- drivers/base/power/main.c | 97 ++++++++++++++++++++++++++++++++++++---------- 1 file changed, 77 insertions(+), 20 deletions(-) Index: linux-2.6/drivers/base/power/main.c =================================================================== --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -165,6 +165,32 @@ void device_pm_move_last(struct device * list_move_tail(&dev->power.entry, &dpm_list); } +static ktime_t initcall_debug_start(struct device *dev) +{ + ktime_t calltime = ktime_set(0, 0); + + if (initcall_debug) { + pr_info("calling %s_i+ @ %i\n", + dev_name(dev), task_pid_nr(current)); + calltime = ktime_get(); + } + + return calltime; +} + +static void initcall_debug_report(struct device *dev, ktime_t calltime, + int error) +{ + ktime_t delta, rettime; + + if (initcall_debug) { + rettime = ktime_get(); + delta = ktime_sub(rettime, calltime); + pr_info("call %s+ returned %d after %Ld usecs\n", dev_name(dev), + error, (unsigned long long)ktime_to_ns(delta) >> 10); + } +} + /** * dpm_wait - Wait for a PM operation to complete. * @dev: Device to wait for. @@ -201,13 +227,9 @@ static int pm_op(struct device *dev, pm_message_t state) { int error = 0; - ktime_t calltime, delta, rettime; + ktime_t calltime; - if (initcall_debug) { - pr_info("calling %s+ @ %i\n", - dev_name(dev), task_pid_nr(current)); - calltime = ktime_get(); - } + calltime = initcall_debug_start(dev); switch (state.event) { #ifdef CONFIG_SUSPEND @@ -256,12 +278,7 @@ static int pm_op(struct device *dev, error = -EINVAL; } - if (initcall_debug) { - rettime = ktime_get(); - delta = ktime_sub(rettime, calltime); - pr_info("call %s+ returned %d after %Ld usecs\n", dev_name(dev), - error, (unsigned long long)ktime_to_ns(delta) >> 10); - } + initcall_debug_report(dev, calltime, error); return error; } @@ -338,8 +355,9 @@ static int pm_noirq_op(struct device *de if (initcall_debug) { rettime = ktime_get(); delta = ktime_sub(rettime, calltime); - printk("initcall %s_i+ returned %d after %Ld usecs\n", dev_name(dev), - error, (unsigned long long)ktime_to_ns(delta) >> 10); + printk("initcall %s_i+ returned %d after %Ld usecs\n", + dev_name(dev), error, + (unsigned long long)ktime_to_ns(delta) >> 10); } return error; @@ -456,6 +474,26 @@ void dpm_resume_noirq(pm_message_t state EXPORT_SYMBOL_GPL(dpm_resume_noirq); /** + * legacy_resume - Execute a legacy (bus or class) resume callback for device. + * dev: Device to resume. + * cb: Resume callback to execute. + */ +static int legacy_resume(struct device *dev, int (*cb)(struct device *dev)) +{ + int error; + ktime_t calltime; + + calltime = initcall_debug_start(dev); + + error = cb(dev); + suspend_report_result(cb, error); + + initcall_debug_report(dev, calltime, error); + + return error; +} + +/** * __device_resume - Execute "resume" callbacks for given device. * @dev: Device to handle. * @state: PM transition of the system being carried out. @@ -477,7 +515,7 @@ static int __device_resume(struct device error = pm_op(dev, dev->bus->pm, state); } else if (dev->bus->resume) { pm_dev_dbg(dev, state, "legacy "); - error = dev->bus->resume(dev); + error = legacy_resume(dev, dev->bus->resume); } if (error) goto End; @@ -498,7 +536,7 @@ static int __device_resume(struct device error = pm_op(dev, dev->class->pm, state); } else if (dev->class->resume) { pm_dev_dbg(dev, state, "legacy class "); - error = dev->class->resume(dev); + error = legacy_resume(dev, dev->class->resume); } } End: @@ -734,6 +772,27 @@ EXPORT_SYMBOL_GPL(dpm_suspend_noirq); static int async_error; /** + * legacy_suspend - Execute a legacy (bus or class) suspend callback for device. + * dev: Device to suspend. + * cb: Suspend callback to execute. + */ +static int legacy_suspend(struct device *dev, pm_message_t state, + int (*cb)(struct device *dev, pm_message_t state)) +{ + int error; + ktime_t calltime; + + calltime = initcall_debug_start(dev); + + error = cb(dev, state); + suspend_report_result(cb, error); + + initcall_debug_report(dev, calltime, error); + + return error; +} + +/** * device_suspend - Execute "suspend" callbacks for given device. * @dev: Device to handle. * @state: PM transition of the system being carried out. @@ -755,8 +814,7 @@ static int __device_suspend(struct devic error = pm_op(dev, dev->class->pm, state); } else if (dev->class->suspend) { pm_dev_dbg(dev, state, "legacy class "); - error = dev->class->suspend(dev, state); - suspend_report_result(dev->class->suspend, error); + error = legacy_suspend(dev, state, dev->class->suspend); } if (error) goto End; @@ -777,8 +835,7 @@ static int __device_suspend(struct devic error = pm_op(dev, dev->bus->pm, state); } else if (dev->bus->suspend) { pm_dev_dbg(dev, state, "legacy "); - error = dev->bus->suspend(dev, state); - suspend_report_result(dev->bus->suspend, error); + error = legacy_suspend(dev, state, dev->bus->suspend); } } ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-17 1:49 ` Rafael J. Wysocki @ 2009-12-17 20:06 ` Alan Stern 2009-12-17 20:36 ` Rafael J. Wysocki 2009-12-18 1:51 ` Rafael J. Wysocki 1 sibling, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-17 20:06 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Thu, 17 Dec 2009, Rafael J. Wysocki wrote: > That actually is correct. On the nx6325 suspend is totally dominated by disk > spindown, almost everything else is negligible compared to it (well, except for > the audio), so we can't go down below 1 s during suspend on this box. > > On the Wind, disk spindown time is comparable with serio suspend time, > so at least in principle we should be able to get .5 s suspend on this box - > if the disk spindown in async. > > In turn, the resume on the Wind is dominated by disk spinup, so we can't > go below 1.5 s on this box during resume (notice that the "async+extra" > approach brings us close to this limit, although we could save .5 s more in > principle by making more devices async). > > Resume on the nx6325 is a different story, though, as it is dominated by USB > and PCI devices, so marking those as async would probably bring us close to > the limit. The implications seem pretty clear. If the following sorts of devices were async: USB (devices and interfaces), PCI, serio, SCSI (hosts, targets, devices) then we would reap close to the maximum benefit -- providing: async threads are started in a first pass without waiting for synchronous devices, and It's not clear that making all these types of devices async will really work, but it's worth testing. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-17 20:06 ` Alan Stern @ 2009-12-17 20:36 ` Rafael J. Wysocki 0 siblings, 0 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-17 20:36 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Thursday 17 December 2009, Alan Stern wrote: > On Thu, 17 Dec 2009, Rafael J. Wysocki wrote: > > > That actually is correct. On the nx6325 suspend is totally dominated by disk > > spindown, almost everything else is negligible compared to it (well, except for > > the audio), so we can't go down below 1 s during suspend on this box. > > > > On the Wind, disk spindown time is comparable with serio suspend time, > > so at least in principle we should be able to get .5 s suspend on this box - > > if the disk spindown in async. > > > > In turn, the resume on the Wind is dominated by disk spinup, so we can't > > go below 1.5 s on this box during resume (notice that the "async+extra" > > approach brings us close to this limit, although we could save .5 s more in > > principle by making more devices async). > > > > Resume on the nx6325 is a different story, though, as it is dominated by USB > > and PCI devices, so marking those as async would probably bring us close to > > the limit. > > The implications seem pretty clear. If the following sorts of devices > were async: > > USB (devices and interfaces), PCI, serio, SCSI (hosts, targets, > devices) Plus ACPI battery. > then we would reap close to the maximum benefit -- providing: > > async threads are started in a first pass without waiting > for synchronous devices, and Agreed. > It's not clear that making all these types of devices async will really > work, but it's worth testing. I'm working on it. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-17 1:49 ` Rafael J. Wysocki 2009-12-17 20:06 ` Alan Stern @ 2009-12-18 1:51 ` Rafael J. Wysocki 2009-12-18 17:26 ` Alan Stern 2009-12-18 23:42 ` Rafael J. Wysocki 1 sibling, 2 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-18 1:51 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Thursday 17 December 2009, Rafael J. Wysocki wrote: ... > Tomorrow I'll try to mark as many devices as reasonably possible as async > and see how the total suspend-resume times change. I didn't manage to do that, but I was able to mark sd and i8042 as async and see the impact of this. The raw data are in the usual place: http://www.sisk.pl/kernel/data/async-suspend-resume.pdf and the individual device timings and logs are in: http://www.sisk.pl/kernel/data/nx6325/ http://www.sisk.pl/kernel/data/wind/ This is the summary (previous results are inculded for easier reference): HP nx6325 MSI Wind U100 sync suspend 1482 (+/- 40) 1180 (+/- 24) sync resume 2955 (+/- 2) 3597 (+/- 25) async suspend 1553 (+/- 49) 1177 (+/- 32) async resume 2692 (+/- 326) 3556 (+/- 33) async+one-liner suspend 1600 (+/- 39) 1212 (+/- 41) async+one-liner resume 2692 (+/- 324) 3579 (+/- 24) async+extra suspend 1496 (+/- 37) 1217 (+/- 38) async+extra resume 1859 (+/- 114) 1923 (+/- 35) with "async" i8042 and sd: async suspend 1319 (+/- 51) 1045 (+/- 41) async resume 2929 (+/- 3) 3546 (+/- 27) async+extra suspend 1327 (+/- 36) (didn't work) async+extra resume 1742 (+/- 164) 1896 (+/- 28) (the summary is also available at: http://www.sisk.pl/kernel/data/results.txt). So, it actually makes the case for async suspend! Although it's not very strong, with these two additional devices marked as "async" we get noticeable suspend time improvement. Still, the "extra" patch doesn't help on suspend at all and on the Wind the suspend part of it didn't even work (I'm yet to figure out which of the two devices crashed the suspend). Nevertheless the resume part of the "extra" patch worked in both cases and worked better than without the two additional "async" devices. To me, this means that the suspend part of the "extra" patch is not really useful. However, the resume part of it is _very_ useful, so I'd like to add that part only to the async patchset. The explanation why it helps so much is also straightforward to me. Namely, if slow async devices are last to resume, then without the "extra" patch they need to wait for all of the preceding sync devices and the speedup from executing their resume routines asynchronously is very limited. Now, with the "extra" patch their resume routines start as soon as their parents complete resuming and that may be early enough for the speedup to be significant. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-18 1:51 ` Rafael J. Wysocki @ 2009-12-18 17:26 ` Alan Stern 2009-12-19 21:41 ` Rafael J. Wysocki 2009-12-18 23:42 ` Rafael J. Wysocki 1 sibling, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-18 17:26 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Fri, 18 Dec 2009, Rafael J. Wysocki wrote: > I didn't manage to do that, but I was able to mark sd and i8042 as async and > see the impact of this. Apparently this didn't do what you wanted. In the nx6325 sd+i8042+async+extra log, the 0:0:0:0 device (which is a SCSI disk) was suspended by the main thread instead of an async thread. There's an important point I neglected to mention before. Your logs don't show anything for devices with no suspend callbacks at all. Nevertheless, these devices sit on the device list and prevent other devices from suspending or resuming as soon as they could. For example, the fingerprint sensor (3-1) took the most time to resume. But other devices were delayed until after it finished because it had children with no callbacks, and they delayed the devices following them in the list. What would happen if you completed these devices immediately, as part of the first pass? Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-18 17:26 ` Alan Stern @ 2009-12-19 21:41 ` Rafael J. Wysocki 2009-12-20 3:48 ` Alan Stern 0 siblings, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-19 21:41 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Friday 18 December 2009, Alan Stern wrote: > On Fri, 18 Dec 2009, Rafael J. Wysocki wrote: > > > I didn't manage to do that, but I was able to mark sd and i8042 as async and > > see the impact of this. > > Apparently this didn't do what you wanted. In the nx6325 > sd+i8042+async+extra log, the 0:0:0:0 device (which is a SCSI disk) was > suspended by the main thread instead of an async thread. Hm, that's odd, because there's a noticeable time difference between the two cases in which the sd is sync and async. I'll look into it further. > There's an important point I neglected to mention before. Your logs > don't show anything for devices with no suspend callbacks at all. > Nevertheless, these devices sit on the device list and prevent other > devices from suspending or resuming as soon as they could. Unless they are async, that is. > For example, the fingerprint sensor (3-1) took the most time to resume. > But other devices were delayed until after it finished because it had > children with no callbacks, and they delayed the devices following > them in the list. > > What would happen if you completed these devices immediately, as part > of the first pass? OK. How do the PM core is supposed to check if a device has null suspend and resume? Check all of the function pointers in the first pass? Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-19 21:41 ` Rafael J. Wysocki @ 2009-12-20 3:48 ` Alan Stern 2009-12-20 12:55 ` Rafael J. Wysocki 0 siblings, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-20 3:48 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Sat, 19 Dec 2009, Rafael J. Wysocki wrote: > On Friday 18 December 2009, Alan Stern wrote: > > On Fri, 18 Dec 2009, Rafael J. Wysocki wrote: > > > > > I didn't manage to do that, but I was able to mark sd and i8042 as async and > > > see the impact of this. > > > > Apparently this didn't do what you wanted. In the nx6325 > > sd+i8042+async+extra log, the 0:0:0:0 device (which is a SCSI disk) was To be precise, the device is an ATA or SATA disk but it is managed by the sd driver. > > suspended by the main thread instead of an async thread. > > Hm, that's odd, because there's a noticeable time difference between the > two cases in which the sd is sync and async. I'll look into it further. I don't know what the whole story is, but the PID number tells the tale. > > There's an important point I neglected to mention before. Your logs > > don't show anything for devices with no suspend callbacks at all. > > Nevertheless, these devices sit on the device list and prevent other > > devices from suspending or resuming as soon as they could. > > Unless they are async, that is. Yes. It would be simpler to make them async. But first we ought to know what they are. Can you add an extra line to the log for such devices? What I'm afraid of is that there might be a "normal" device with a "normal" ancestor but with "abnormal" devices in between (where "normal" means there is a suspend or resume routine and "abnormal" means all the method pointers are NULL). I know that this happens when there's a USB mass-storage device, for example. If we complete the intermediate devices immediately, then there won't be anything to prevent the ancestor from suspending before the device or the device from resuming before the ancestor. Forcing the "abnormal" devices to be async, even if they aren't marked that way, would avoid these problems. > > For example, the fingerprint sensor (3-1) took the most time to resume. > > But other devices were delayed until after it finished because it had > > children with no callbacks, and they delayed the devices following > > them in the list. > > > > What would happen if you completed these devices immediately, as part > > of the first pass? > > OK. How do the PM core is supposed to check if a device has null suspend > and resume? Check all of the function pointers in the first pass? All the relevant pointers (including the legacy pointers). That is, you check only the suspend pointers during the first suspend pass, and likewise for resume. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-20 3:48 ` Alan Stern @ 2009-12-20 12:55 ` Rafael J. Wysocki 0 siblings, 0 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-20 12:55 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Sunday 20 December 2009, Alan Stern wrote: > On Sat, 19 Dec 2009, Rafael J. Wysocki wrote: > > > On Friday 18 December 2009, Alan Stern wrote: > > > On Fri, 18 Dec 2009, Rafael J. Wysocki wrote: > > > > > > > I didn't manage to do that, but I was able to mark sd and i8042 as async and > > > > see the impact of this. > > > > > > Apparently this didn't do what you wanted. In the nx6325 > > > sd+i8042+async+extra log, the 0:0:0:0 device (which is a SCSI disk) was > > To be precise, the device is an ATA or SATA disk but it is managed by > the sd driver. > > > > suspended by the main thread instead of an async thread. > > > > Hm, that's odd, because there's a noticeable time difference between the > > two cases in which the sd is sync and async. I'll look into it further. > > I don't know what the whole story is, but the PID number tells the > tale. > > > > There's an important point I neglected to mention before. Your logs > > > don't show anything for devices with no suspend callbacks at all. > > > Nevertheless, these devices sit on the device list and prevent other > > > devices from suspending or resuming as soon as they could. > > > > Unless they are async, that is. > > Yes. It would be simpler to make them async. But first we ought to > know what they are. Can you add an extra line to the log for such > devices? Sure, I'll do that. > What I'm afraid of is that there might be a "normal" device with a > "normal" ancestor but with "abnormal" devices in between (where > "normal" means there is a suspend or resume routine and "abnormal" > means all the method pointers are NULL). I know that this happens when > there's a USB mass-storage device, for example. If we complete the > intermediate devices immediately, then there won't be anything to > prevent the ancestor from suspending before the device or the device > from resuming before the ancestor. I'm afraid of that too. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-18 1:51 ` Rafael J. Wysocki 2009-12-18 17:26 ` Alan Stern @ 2009-12-18 23:42 ` Rafael J. Wysocki 1 sibling, 0 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-18 23:42 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Friday 18 December 2009, Rafael J. Wysocki wrote: > On Thursday 17 December 2009, Rafael J. Wysocki wrote: > ... > > Tomorrow I'll try to mark as many devices as reasonably possible as async > > and see how the total suspend-resume times change. > > I didn't manage to do that, but I was able to mark sd and i8042 as async and > see the impact of this. > > The raw data are in the usual place: > > http://www.sisk.pl/kernel/data/async-suspend-resume.pdf > > and the individual device timings and logs are in: > > http://www.sisk.pl/kernel/data/nx6325/ > http://www.sisk.pl/kernel/data/wind/ > > This is the summary (previous results are inculded for easier reference): > > HP nx6325 MSI Wind U100 > > sync suspend 1482 (+/- 40) 1180 (+/- 24) > sync resume 2955 (+/- 2) 3597 (+/- 25) > > async suspend 1553 (+/- 49) 1177 (+/- 32) > async resume 2692 (+/- 326) 3556 (+/- 33) > > async+one-liner suspend 1600 (+/- 39) 1212 (+/- 41) > async+one-liner resume 2692 (+/- 324) 3579 (+/- 24) > > async+extra suspend 1496 (+/- 37) 1217 (+/- 38) > async+extra resume 1859 (+/- 114) 1923 (+/- 35) > > with "async" i8042 and sd: > > async suspend 1319 (+/- 51) 1045 (+/- 41) > async resume 2929 (+/- 3) 3546 (+/- 27) > > async+extra suspend 1327 (+/- 36) (didn't work) > async+extra resume 1742 (+/- 164) 1896 (+/- 28) > > (the summary is also available at: http://www.sisk.pl/kernel/data/results.txt). > > So, it actually makes the case for async suspend! Although it's not very > strong, with these two additional devices marked as "async" we get noticeable > suspend time improvement. > > Still, the "extra" patch doesn't help on suspend at all and on the Wind the > suspend part of it didn't even work (I'm yet to figure out which of the two > devices crashed the suspend). Small update. I've just verified that sd was the failing device, although I'm not sure about the reason. Apart from this, I ran some tests on the Wind with i8042 marked as "async" and sd marked as "sync". In that case all of the tests succeeded and I got the following numbers: suspend (i8042 async, full extra patch applied): 1070 (+/- 40) resume (i8042 async, full extra patch applied): 1915,84 (+/- 27) suspend (i8042 async, resume part of extra patch applied): 1050 (+/- 34) First, It looks like the suspend speedup was related to marking i8042 as "async". Since the serio devices, which are the i8042's children, were also "async" (just like in all of the tests before), this means that the speedup resulted from removing a suspend stall caused by a sync parent of async children (i8042 and serio, respectively, in this case). However, the suspend part of the extra patch doesn't help really. In fact it even makes things worse. So, I still think the resume part of the extra patch is definitely useful, but the suspend part of it is not. IOW, it's worth running async resumes upfront, but it's not worth running async suspends upfront. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-12 18:54 ` Linus Torvalds 2009-12-12 22:34 ` Rafael J. Wysocki @ 2009-12-13 13:08 ` Rafael J. Wysocki 2009-12-13 17:30 ` Alan Stern 2 siblings, 0 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-13 13:08 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Stern, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Saturday 12 December 2009, Linus Torvalds wrote: > > On Sat, 12 Dec 2009, Rafael J. Wysocki wrote: > > > > I'd like to put it into my tree in this form, if you don't mind. > > This version still has a major problem, which is not related to > completions vs rwsems, but simply to the fact that you wanted to do this > at the generic device layer level rather than do it at the actual > low-level suspend/resume level. > > Namely that there's no apparent sane way to say "don't wait for children". There is, if the partent would really do something that could disturb the children. This isn't always the case, but at least in a few important cases it is (think of a USB controller and USB devices behind it, for example). I thought we had this discussion already, but perhaps that was with someone else and in a slightly different context. The main reasons why I think it's useful to do this at the generic device layer level are that, if we do it this way: a. Drivers that don't want to be "asynchronous" don't need to care in any case. b. Drivers whose suspend and resume routines are guaranteed not to disturb anyone else can mark their devices as "async" and be done with it, no other modification of the code is needed (drivers that do nothing in their suspend and resume routines also fall into this category). Now, if it's done at the low-level suspend/resume level, a. will not be true any more in general. Say device A has parent B and the driver of A wants to suspend asynchrnously. It needs to split its suspend into synchronous and asynchronous part and at one point start an async thread to run the latter. Now assume B has a real reason not to suspend before the suspens of A has finished. Then, the driver of B has to be modified so that it waits for the A's async suspend to complete (some sort of synchronization between the two has to be added). So, even if B is "synchronous", its driver has to be modified to handle the asynchronous suspend of A. Similarly, b. will no longer be true if it's done at the low-level suspend/resume level, because now every driver that wants to be "asynchronous" will need to take care of running an async thread etc. Moreover, it will need to make sure that the device parent's driver doesn't need to be modified, because the parent's suspend may do something that will disturb the child's asynchronous suspend. Furthermore, if the parent's driver doesn't need to be modified, it will need to consider the parent of the parent, because that one may potentially disturb the asynchronous suspend of its grand child and so on up to a device without a parent. That already is a pain to a driver writer, but the problem you're saying would be solved by doing this at the low-level suspend/resume level is still there in general! Namely, go back do the example with devices A and B and say B _really_ has to wait for A's suspend to complete. Then, since B is after A in dpm_list, the PM core will not start the suspend of any device after B until the suspend of B returns. Now, if the suspend of B waits for the suspend of A, then the PM core will effectively wait for the suspend of A to complete before suspending any other devices. Worse yet, if that happens, we can't do anything about it at the low-level suspend/resume level, althouth at the PM core level we can. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-12 18:54 ` Linus Torvalds 2009-12-12 22:34 ` Rafael J. Wysocki 2009-12-13 13:08 ` Rafael J. Wysocki @ 2009-12-13 17:30 ` Alan Stern 2009-12-13 19:02 ` [linux-pm] " Alan Stern 2 siblings, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-13 17:30 UTC (permalink / raw) To: Linus Torvalds Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Sat, 12 Dec 2009, Linus Torvalds wrote: > This version still has a major problem, which is not related to > completions vs rwsems, but simply to the fact that you wanted to do this > at the generic device layer level rather than do it at the actual > low-level suspend/resume level. > > Namely that there's no apparent sane way to say "don't wait for children". > > PCI bridges that don't suspend at all - or any other device that only > suspends in the 'suspend_late()' thing, for that matter - don't have any > reason what-so-ever to wait for children, since they aren't actually > suspending in the first place. But you make them wait regardless, which > then serializes things unnecessarily (for example, two unrelated USB > controllers). In reality this should never be a problem. Consider that ultimately we want to achieve the following two goals: Implement a two-pass algorithm, so that synchronous devices can't cause spurious dependencies between two async devices. (This will fix the issue of an intermediate PCI bridge serializing two unrelated USB controllers.) Convert all lengthy suspend/resume operations to async. Obviously we don't want to do this all at once. But until the goals are achieved, there's no point worrying about devices being forced to wait for their children or parents. And after the goals are achieved, it won't matter. Why not? Consider the devices which would be delayed. If they use synchronous suspend/resume then they won't take much time, so delaying them won't matter. Indeed, based on Arjan's preliminary measurements it's fair to say that the total time taken by all the synchronous suspends/resumes put together should be negligible. Even if all of them were somehow delayed until all the async activities were complete, nobody would notice or care. (And conversely, if all the async activities could somehow be forced to wait until all the synchronous suspends/resumes were done, nobody would notice or care.) Okay, so consider a case where A comes before B in dpm_list and B is the parent of C. Suppose B doesn't need to wait for C to suspend, but we force it to wait anyhow. If A or C is synchronous then we're okay, by the considerations above. Suppose A is async. Then it wouldn't be delayed unless it was one of B's ancestors, so suppose it is. Now we are potentially delaying A more than necessary. Or are we? Even though B might not need to wait for C to suspend, there's an excellent chance that A _does_ need to wait for C. If we allow B to suspend before C then there would be nothing to prevent A from suspending too quickly. A's driver would need to wait explicitly for C -- which is unreasonable since C isn't one of A's children. (Rafael made a similar point.) In short, allowing devices to suspend before their children would be dangerous and probably would not save a significant amount of time. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [linux-pm] Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-13 17:30 ` Alan Stern @ 2009-12-13 19:02 ` Alan Stern 0 siblings, 0 replies; 235+ messages in thread From: Alan Stern @ 2009-12-13 19:02 UTC (permalink / raw) To: Linus Torvalds; +Cc: ACPI Devel Maling List, LKML, pm list On Sun, 13 Dec 2009, Alan Stern wrote: > > Namely that there's no apparent sane way to say "don't wait for children". > > > > PCI bridges that don't suspend at all - or any other device that only > > suspends in the 'suspend_late()' thing, for that matter - don't have any > > reason what-so-ever to wait for children, since they aren't actually > > suspending in the first place. But you make them wait regardless, which > > then serializes things unnecessarily (for example, two unrelated USB > > controllers). > In short, allowing devices to suspend before their children would be > dangerous and probably would not save a significant amount of time. There's more to be said. Even without this "don't wait for children" thing, there can be bad interactions causing unnecessary delays. For example, suppose A (async) is the parent of B (sync), B comes before C (sync) in dpm_list, and C is the parent of D (async). Even if A & B are unrelated to C & D, they will be forced to wait for them. It doesn't matter that A and D are unrelated and so could suspend concurrently. In essence, every synchonrous device is treated as though it depends on all the synchronous devices preceding it in dpm_list. That's a lot of unnecessary constraints. At the moment we have no choice, because we have to assume that some of those constraints actually are necessary -- and we don't know which ones. It's an inescapable fact: If there are unnecessary ordering constraints then you generally can't be 100% efficient in carrying out parallel operations. Compared with all these extra "synchronous" constraints, the relatively small number of "don't need to wait for children" constraints is harmless. I bet that if we got rid of all unnecessary constraints except for making parents always wait for their children, we'd attain more than 95% of the ideal speedup. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-11 23:48 ` Rafael J. Wysocki 2009-12-11 23:53 ` Linus Torvalds @ 2009-12-12 0:43 ` Alan Stern 2009-12-12 17:35 ` Rafael J. Wysocki 1 sibling, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-12 0:43 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Sat, 12 Dec 2009, Rafael J. Wysocki wrote: > Below is a patch I've just tested, but there's a lockdep problem in it I don't > know how to solve. Namely, lockdep is apparently unhappy with us not releasing > the lock taken in device_suspend() and it complains we take it twice in a row > (which we do, but for another device). I need to use down_read_non_owner() > to make it shut up and then I also need to use up_read_non_owner() in > __device_suspend(), although there's the comment in include/linux/rwsem.h > saying exatly this about that: > > /* > * Take/release a lock when not the owner will release it. > * > * [ This API should be avoided as much as possible - the > * proper abstraction for this case is completions. ] > */ > > (I'd like to know your opinion about that). Yet, that's not all, because next > it complains during resume that __device_resume() releases a lock it didn't > acquire, which it clearly does, but that is intentional. Unfortunately, > there's no up_write_non_owner() ... Hah! I knew it! How come lockdep didn't complain earlier? What's different about this patch? Only the nesting annotations? Why should adding annotations make lockdep less happy? Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-12 0:43 ` Alan Stern @ 2009-12-12 17:35 ` Rafael J. Wysocki 0 siblings, 0 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-12 17:35 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Saturday 12 December 2009, Alan Stern wrote: > On Sat, 12 Dec 2009, Rafael J. Wysocki wrote: > > > Below is a patch I've just tested, but there's a lockdep problem in it I don't > > know how to solve. Namely, lockdep is apparently unhappy with us not releasing > > the lock taken in device_suspend() and it complains we take it twice in a row > > (which we do, but for another device). I need to use down_read_non_owner() > > to make it shut up and then I also need to use up_read_non_owner() in > > __device_suspend(), although there's the comment in include/linux/rwsem.h > > saying exatly this about that: > > > > /* > > * Take/release a lock when not the owner will release it. > > * > > * [ This API should be avoided as much as possible - the > > * proper abstraction for this case is completions. ] > > */ > > > > (I'd like to know your opinion about that). Yet, that's not all, because next > > it complains during resume that __device_resume() releases a lock it didn't > > acquire, which it clearly does, but that is intentional. Unfortunately, > > there's no up_write_non_owner() ... > > Hah! I knew it! > > How come lockdep didn't complain earlier? What's different about this > patch? Only the nesting annotations? Why should adding annotations > make lockdep less happy? I'm not sure. Perhaps I made a mistake during the previous tests. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-09 23:18 ` Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) Rafael J. Wysocki 2009-12-10 2:51 ` Linus Torvalds @ 2009-12-10 15:31 ` Alan Stern 2009-12-10 15:45 ` Linus Torvalds 2009-12-10 21:14 ` Rafael J. Wysocki 1 sibling, 2 replies; 235+ messages in thread From: Alan Stern @ 2009-12-10 15:31 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Thu, 10 Dec 2009, Rafael J. Wysocki wrote: > > How about CONFIG_PROVE_LOCKING? If lockdep really does start > > complaining then switching to completions would be a simple way to > > appease it. > > Ah, that one is not set. I guess I'll try it later, although I've already > decided to use completions anyway. You should see how badly lockdep complains about the rwsems. If it really doesn't like them then using completions makes sense. > Index: linux-2.6/drivers/base/power/main.c > =================================================================== > --- linux-2.6.orig/drivers/base/power/main.c > +++ linux-2.6/drivers/base/power/main.c > @@ -56,6 +58,7 @@ static bool transition_started; > void device_pm_init(struct device *dev) > { > dev->power.status = DPM_ON; > + init_completion(&dev->power.completion); > pm_runtime_init(dev); > } You need a matching complete_all() in device_pm_remove(), in case someone else is waiting for the device when it gets unregistered. > +/** > + * dpm_synchronize - Wait for PM callbacks of all devices to complete. > + */ > +static void dpm_synchronize(void) > +{ > + struct device *dev; > + > + async_synchronize_full(); > + > + mutex_lock(&dpm_list_mtx); > + list_for_each_entry(dev, &dpm_list, power.entry) > + INIT_COMPLETION(dev->power.completion); > + mutex_unlock(&dpm_list_mtx); > +} I agree with Linus, initializing the completions here is weird. You should initialize them just before using them. > @@ -683,6 +786,7 @@ static int dpm_suspend(pm_message_t stat > > INIT_LIST_HEAD(&list); > mutex_lock(&dpm_list_mtx); > + pm_transition = state; > while (!list_empty(&dpm_list)) { > struct device *dev = to_device(dpm_list.prev); > > @@ -697,13 +801,18 @@ static int dpm_suspend(pm_message_t stat > put_device(dev); > break; > } > - dev->power.status = DPM_OFF; > if (!list_empty(&dev->power.entry)) > list_move(&dev->power.entry, &list); > put_device(dev); > + error = atomic_read(&async_error); > + if (error) > + break; > } > list_splice(&list, dpm_list.prev); Here's something you might want to do in a later patch. These awkward list-pointer manipulations can be simplified as follows: static bool dpm_iterate_forward; static struct device *dpm_next; In device_pm_remove(): mutex_lock(&dpm_list_mtx); if (dev == dpm_next) dpm_next = to_device(dpm_iterate_forward ? dev->power.entry.next : dev->power.entry.prev); list_del_init(&dev->power.entry); mutex_unlock(&dpm_list_mtx); In dpm_resume(): dpm_iterate_forward = true; list_for_each_entry_safe(dev, dpm_next, dpm_list, power.entry) { ... In dpm_suspend(): dpm_iterate_forward = false; list_for_each_entry_safe_reverse(dev, dpm_next, dpm_list, power.entry) { ... Whether this really is better is a matter of opinion; I like it. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-10 15:31 ` Alan Stern @ 2009-12-10 15:45 ` Linus Torvalds 2009-12-10 18:37 ` Alan Stern 2009-12-10 21:14 ` Rafael J. Wysocki 1 sibling, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-10 15:45 UTC (permalink / raw) To: Alan Stern Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Thu, 10 Dec 2009, Alan Stern wrote: > > In device_pm_remove(): > > mutex_lock(&dpm_list_mtx); > if (dev == dpm_next) > dpm_next = to_device(dpm_iterate_forward ? > dev->power.entry.next : dev->power.entry.prev); > list_del_init(&dev->power.entry); > mutex_unlock(&dpm_list_mtx); I'm really not seeing the point - it's much better to hardcode the ordering in the place you use it (where it is static and the compiler can generate bette code) than to do some dynamic choice that depends on some fake flag - especially a global one. Also, quite frankly, error handling needs to be separated out of the whole async patch, and needs to be thought about a lot more. And I would seriously argue that if you have any async suspends, then those async suspends are _not_ allowed to fail. At least not initially Having async failures and trying to fix them up is just a disaster. Which ones actually failed, and which ones were aborted before they even really got to their suspend routines? Which ones do you try to resume? IOW, it needs way more thought than what has clearly happened so far. And once more, I will refuse to merge anything that is complicated for no actual reason (where reason is "real life, and tested to make a big difference", not some hand-waving) Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-10 15:45 ` Linus Torvalds @ 2009-12-10 18:37 ` Alan Stern 2009-12-10 23:51 ` Linus Torvalds 0 siblings, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-10 18:37 UTC (permalink / raw) To: Linus Torvalds Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Thu, 10 Dec 2009, Linus Torvalds wrote: > > > On Thu, 10 Dec 2009, Alan Stern wrote: > > > > In device_pm_remove(): > > > > mutex_lock(&dpm_list_mtx); > > if (dev == dpm_next) > > dpm_next = to_device(dpm_iterate_forward ? > > dev->power.entry.next : dev->power.entry.prev); > > list_del_init(&dev->power.entry); > > mutex_unlock(&dpm_list_mtx); > > I'm really not seeing the point - it's much better to hardcode the > ordering in the place you use it (where it is static and the compiler can > generate bette code) than to do some dynamic choice that depends on some > fake flag - especially a global one. You probably didn't look closely at the original code in dpm_suspend() and dpm_resume(). It's very awkward; each device is removed from dpm_list, operated on, and then added on to a new local list. At the end the new list is spliced back into dpm_list. This approach is better because it doesn't involve changing any list pointers while the sleep transition is in progress. At any rate, I don't recommend doing it in the same patch as the async stuff; it should be done separately. Either before or after -- the two are independent. > Also, quite frankly, error handling needs to be separated out of the whole > async patch, and needs to be thought about a lot more. And I would > seriously argue that if you have any async suspends, then those async > suspends are _not_ allowed to fail. At least not initially > > Having async failures and trying to fix them up is just a disaster. Which > ones actually failed, and which ones were aborted before they even really > got to their suspend routines? Which ones do you try to resume? We record the status of each device; dev->power.status stores different values depending on whether the device suspend succeeded or failed. The value will be correct and up-to-date after async_synchronize_full() returns. The value is used in dpm_resume() to decide which devices need their resume methods called. I don't see any problems there. > IOW, it needs way more thought than what has clearly happened so far. And > once more, I will refuse to merge anything that is complicated for no > actual reason (where reason is "real life, and tested to make a big > difference", not some hand-waving) I don't think the error handling requires more than minimal changes. The whole atomic_t thing was overkill. It probably stemmed from a discussion some time back with Pavel Machek about concurrent writes to a single variable. I claimed that concurrent writes to a properly aligned pointer, int, or long would never create a "mash-up"; that is, readers would see either the original value or one of the new values but never some weird combination of bits. Alan Cox pointed out that while this was technically correct, there's nothing to prevent the compiler from translating a = b + c; into something like: load b, R1 store R1, a load c, R1 add R1, a in which case readers might see the intermediate value. (Okay, the compiler would have to be pretty stupid to do this with such a simple expression, but it could happen with more complicated expressions.) Pavel favored always using atomic types when there could be concurrent writes, and apparently Rafael was following his advice. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-10 18:37 ` Alan Stern @ 2009-12-10 23:51 ` Linus Torvalds 0 siblings, 0 replies; 235+ messages in thread From: Linus Torvalds @ 2009-12-10 23:51 UTC (permalink / raw) To: Alan Stern Cc: Rafael J. Wysocki, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Thu, 10 Dec 2009, Alan Stern wrote: > > You probably didn't look closely at the original code in dpm_suspend() > and dpm_resume(). It's very awkward; each device is removed from > dpm_list, operated on, and then added on to a new local list. At the > end the new list is spliced back into dpm_list. > > This approach is better because it doesn't involve changing any list > pointers while the sleep transition is in progress. At any rate, I > don't recommend doing it in the same patch as the async stuff; it > should be done separately. Either before or after -- the two are > independent. I do agree with the "independent" part. But I don't agree about the awkwardness per se. Sure, it moves things back and forth and has private lists, but that's actually a fairly standard thing to do in those kinds of situations where you're taking something off a list, operating on it, and may need to put it back on the same list eventually. The VM layer does similar things. So that's why I think your version was actually odder - the existing list manipulation isn't all that odd. It has that strange "did we get removed while we dropped the lock and tried to suspend the device" thing, of course, but that's not entirely unheard of either. Could it be done more cleanly? I think so, but I agree with you that it's likely a separate issue. I _suspect_, for example, that we could just do something like, the appended to avoid _some_ of the subtlety. IOW, just move the device to the local list early - and if it gets removed while being suspended, it will automatically get removed from the local list (the remover doesn't care _what_ list it is on whe it does a 'list_del(power.entr)'). UNTESTED PATCH! This may be total crap, of course. But it _looks_ like an "ObviousCleanup(tm)" - famous last words. Linus --- drivers/base/power/main.c | 3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 8aa2443..f2bb493 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -687,6 +687,7 @@ static int dpm_suspend(pm_message_t state) struct device *dev = to_device(dpm_list.prev); get_device(dev); + list_move(&dev->power.entry, &list); mutex_unlock(&dpm_list_mtx); error = device_suspend(dev, state); @@ -698,8 +699,6 @@ static int dpm_suspend(pm_message_t state) break; } dev->power.status = DPM_OFF; - if (!list_empty(&dev->power.entry)) - list_move(&dev->power.entry, &list); put_device(dev); } list_splice(&list, dpm_list.prev); ^ permalink raw reply related [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-10 15:31 ` Alan Stern 2009-12-10 15:45 ` Linus Torvalds @ 2009-12-10 21:14 ` Rafael J. Wysocki 2009-12-10 22:17 ` Alan Stern 1 sibling, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-10 21:14 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Thursday 10 December 2009, Alan Stern wrote: > On Thu, 10 Dec 2009, Rafael J. Wysocki wrote: > > > > How about CONFIG_PROVE_LOCKING? If lockdep really does start > > > complaining then switching to completions would be a simple way to > > > appease it. > > > > Ah, that one is not set. I guess I'll try it later, although I've already > > decided to use completions anyway. > > You should see how badly lockdep complains about the rwsems. If it > really doesn't like them then using completions makes sense. It does complain about them, but when the nested _down operations are marked as nested, it stops complaining (that's in the version where there's no async in the _noirq phases). > > Index: linux-2.6/drivers/base/power/main.c > > =================================================================== > > --- linux-2.6.orig/drivers/base/power/main.c > > +++ linux-2.6/drivers/base/power/main.c > > @@ -56,6 +58,7 @@ static bool transition_started; > > void device_pm_init(struct device *dev) > > { > > dev->power.status = DPM_ON; > > + init_completion(&dev->power.completion); > > pm_runtime_init(dev); > > } > > You need a matching complete_all() in device_pm_remove(), in case > someone else is waiting for the device when it gets unregistered. Right, added. > > +/** > > + * dpm_synchronize - Wait for PM callbacks of all devices to complete. > > + */ > > +static void dpm_synchronize(void) > > +{ > > + struct device *dev; > > + > > + async_synchronize_full(); > > + > > + mutex_lock(&dpm_list_mtx); > > + list_for_each_entry(dev, &dpm_list, power.entry) > > + INIT_COMPLETION(dev->power.completion); > > + mutex_unlock(&dpm_list_mtx); > > +} > > I agree with Linus, initializing the completions here is weird. You > should initialize them just before using them. I removed that completely and now the INIT_COMPLETION() is always done in the preceding phase. > > @@ -683,6 +786,7 @@ static int dpm_suspend(pm_message_t stat > > > > INIT_LIST_HEAD(&list); > > mutex_lock(&dpm_list_mtx); > > + pm_transition = state; > > while (!list_empty(&dpm_list)) { > > struct device *dev = to_device(dpm_list.prev); > > > > @@ -697,13 +801,18 @@ static int dpm_suspend(pm_message_t stat > > put_device(dev); > > break; > > } > > - dev->power.status = DPM_OFF; > > if (!list_empty(&dev->power.entry)) > > list_move(&dev->power.entry, &list); > > put_device(dev); > > + error = atomic_read(&async_error); > > + if (error) > > + break; > > } > > list_splice(&list, dpm_list.prev); > > Here's something you might want to do in a later patch. These awkward > list-pointer manipulations can be simplified as follows: Well, I'm not sure if that's more straightforward. Anyway, as you said, that's something for a different patch. :-) Below is an updated version of the $subject one. I don't use the atomic_t for async_error any more and (apart from this fixed issue) I don't see any problems in the suspend error path now. Rafael --- drivers/base/power/main.c | 113 ++++++++++++++++++++++++++++++++++++++++--- include/linux/device.h | 6 ++ include/linux/pm.h | 12 ++++ include/linux/resume-trace.h | 7 ++ 4 files changed, 131 insertions(+), 7 deletions(-) Index: linux-2.6/include/linux/pm.h =================================================================== --- linux-2.6.orig/include/linux/pm.h +++ linux-2.6/include/linux/pm.h @@ -26,6 +26,7 @@ #include <linux/spinlock.h> #include <linux/wait.h> #include <linux/timer.h> +#include <linux/completion.h> /* * Callbacks for platform drivers to implement. @@ -412,9 +413,11 @@ struct dev_pm_info { pm_message_t power_state; unsigned int can_wakeup:1; unsigned int should_wakeup:1; + unsigned async_suspend:1; enum dpm_state status; /* Owned by the PM core */ #ifdef CONFIG_PM_SLEEP struct list_head entry; + struct completion completion; #endif #ifdef CONFIG_PM_RUNTIME struct timer_list suspend_timer; @@ -508,6 +511,13 @@ extern void __suspend_report_result(cons __suspend_report_result(__func__, fn, ret); \ } while (0) +extern int __dpm_wait(struct device *dev, void *ign); + +static inline void dpm_wait(struct device *dev) +{ + __dpm_wait(dev, NULL); +} + #else /* !CONFIG_PM_SLEEP */ #define device_pm_lock() do {} while (0) @@ -520,6 +530,8 @@ static inline int dpm_suspend_start(pm_m #define suspend_report_result(fn, ret) do {} while (0) +static inline void dpm_wait(struct device *dev) {} + #endif /* !CONFIG_PM_SLEEP */ /* How to reorder dpm_list after device_move() */ Index: linux-2.6/drivers/base/power/main.c =================================================================== --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -25,6 +25,7 @@ #include <linux/resume-trace.h> #include <linux/rwsem.h> #include <linux/interrupt.h> +#include <linux/async.h> #include "../base.h" #include "power.h" @@ -42,6 +43,7 @@ LIST_HEAD(dpm_list); static DEFINE_MUTEX(dpm_list_mtx); +static pm_message_t pm_transition; /* * Set once the preparation of devices for a PM transition has started, reset @@ -56,6 +58,7 @@ static bool transition_started; void device_pm_init(struct device *dev) { dev->power.status = DPM_ON; + init_completion(&dev->power.completion); pm_runtime_init(dev); } @@ -111,6 +114,7 @@ void device_pm_remove(struct device *dev pr_debug("PM: Removing info for %s:%s\n", dev->bus ? dev->bus->name : "No Bus", kobject_name(&dev->kobj)); + complete_all(&dev->power.completion); mutex_lock(&dpm_list_mtx); list_del_init(&dev->power.entry); mutex_unlock(&dpm_list_mtx); @@ -162,6 +166,24 @@ void device_pm_move_last(struct device * } /** + * __dpm_wait - Wait for a PM operation to complete. + * @dev: Device to wait for. + * @ign: This value is not used by the function. + */ +int __dpm_wait(struct device *dev, void *ign) +{ + if (dev) + wait_for_completion(&dev->power.completion); + return 0; +} +EXPORT_SYMBOL_GPL(__dpm_wait); + +static void dpm_wait_for_children(struct device *dev) +{ + device_for_each_child(dev, NULL, __dpm_wait); +} + +/** * pm_op - Execute the PM operation appropriate for given PM event. * @dev: Device to handle. * @ops: PM operations to choose from. @@ -366,7 +388,7 @@ void dpm_resume_noirq(pm_message_t state mutex_lock(&dpm_list_mtx); transition_started = false; - list_for_each_entry(dev, &dpm_list, power.entry) + list_for_each_entry(dev, &dpm_list, power.entry) { if (dev->power.status > DPM_OFF) { int error; @@ -375,23 +397,27 @@ void dpm_resume_noirq(pm_message_t state if (error) pm_dev_err(dev, state, " early", error); } + /* Needed by the subsequent dpm_resume(). */ + INIT_COMPLETION(dev->power.completion); + } mutex_unlock(&dpm_list_mtx); resume_device_irqs(); } EXPORT_SYMBOL_GPL(dpm_resume_noirq); /** - * device_resume - Execute "resume" callbacks for given device. + * __device_resume - Execute "resume" callbacks for given device. * @dev: Device to handle. * @state: PM transition of the system being carried out. */ -static int device_resume(struct device *dev, pm_message_t state) +static int __device_resume(struct device *dev, pm_message_t state) { int error = 0; TRACE_DEVICE(dev); TRACE_RESUME(0); + dpm_wait(dev->parent); down(&dev->sem); if (dev->bus) { @@ -426,11 +452,34 @@ static int device_resume(struct device * } End: up(&dev->sem); + complete_all(&dev->power.completion); TRACE_RESUME(error); return error; } +static void async_resume(void *data, async_cookie_t cookie) +{ + struct device *dev = (struct device *)data; + int error; + + error = __device_resume(dev, pm_transition); + if (error) + pm_dev_err(dev, pm_transition, " async", error); + put_device(dev); +} + +static int device_resume(struct device *dev) +{ + if (dev->power.async_suspend && !pm_trace_is_enabled()) { + get_device(dev); + async_schedule(async_resume, dev); + return 0; + } + + return __device_resume(dev, pm_transition); +} + /** * dpm_resume - Execute "resume" callbacks for non-sysdev devices. * @state: PM transition of the system being carried out. @@ -444,6 +493,7 @@ static void dpm_resume(pm_message_t stat INIT_LIST_HEAD(&list); mutex_lock(&dpm_list_mtx); + pm_transition = state; while (!list_empty(&dpm_list)) { struct device *dev = to_device(dpm_list.next); @@ -454,7 +504,7 @@ static void dpm_resume(pm_message_t stat dev->power.status = DPM_RESUMING; mutex_unlock(&dpm_list_mtx); - error = device_resume(dev, state); + error = device_resume(dev); mutex_lock(&dpm_list_mtx); if (error) @@ -469,6 +519,7 @@ static void dpm_resume(pm_message_t stat } list_splice(&list, &dpm_list); mutex_unlock(&dpm_list_mtx); + async_synchronize_full(); } /** @@ -623,15 +674,18 @@ int dpm_suspend_noirq(pm_message_t state } EXPORT_SYMBOL_GPL(dpm_suspend_noirq); +static int async_error; + /** * device_suspend - Execute "suspend" callbacks for given device. * @dev: Device to handle. * @state: PM transition of the system being carried out. */ -static int device_suspend(struct device *dev, pm_message_t state) +static int __device_suspend(struct device *dev, pm_message_t state) { int error = 0; + dpm_wait_for_children(dev); down(&dev->sem); if (dev->class) { @@ -666,12 +720,48 @@ static int device_suspend(struct device suspend_report_result(dev->bus->suspend, error); } } + + if (!error) + dev->power.status = DPM_OFF; + End: up(&dev->sem); + complete_all(&dev->power.completion); return error; } +static void async_suspend(void *data, async_cookie_t cookie) +{ + struct device *dev = (struct device *)data; + int error; + + if (async_error) { + complete_all(&dev->power.completion); + goto End; + } + + error = __device_suspend(dev, pm_transition); + if (error) { + pm_dev_err(dev, pm_transition, " async", error); + async_error = error; + } + + End: + put_device(dev); +} + +static int device_suspend(struct device *dev, pm_message_t state) +{ + if (dev->power.async_suspend) { + get_device(dev); + async_schedule(async_suspend, dev); + return 0; + } + + return __device_suspend(dev, pm_transition); +} + /** * dpm_suspend - Execute "suspend" callbacks for all non-sysdev devices. * @state: PM transition of the system being carried out. @@ -683,6 +773,7 @@ static int dpm_suspend(pm_message_t stat INIT_LIST_HEAD(&list); mutex_lock(&dpm_list_mtx); + pm_transition = state; while (!list_empty(&dpm_list)) { struct device *dev = to_device(dpm_list.prev); @@ -697,13 +788,17 @@ static int dpm_suspend(pm_message_t stat put_device(dev); break; } - dev->power.status = DPM_OFF; if (!list_empty(&dev->power.entry)) list_move(&dev->power.entry, &list); put_device(dev); + if (async_error) + break; } list_splice(&list, dpm_list.prev); mutex_unlock(&dpm_list_mtx); + async_synchronize_full(); + if (!error) + error = async_error; return error; } @@ -762,6 +857,7 @@ static int dpm_prepare(pm_message_t stat INIT_LIST_HEAD(&list); mutex_lock(&dpm_list_mtx); transition_started = true; + async_error = 0; while (!list_empty(&dpm_list)) { struct device *dev = to_device(dpm_list.next); @@ -793,8 +889,11 @@ static int dpm_prepare(pm_message_t stat break; } dev->power.status = DPM_SUSPENDING; - if (!list_empty(&dev->power.entry)) + if (!list_empty(&dev->power.entry)) { list_move_tail(&dev->power.entry, &list); + /* Needed by the subsequent dpm_suspend(). */ + INIT_COMPLETION(dev->power.completion); + } put_device(dev); } list_splice(&list, &dpm_list); Index: linux-2.6/include/linux/resume-trace.h =================================================================== --- linux-2.6.orig/include/linux/resume-trace.h +++ linux-2.6/include/linux/resume-trace.h @@ -6,6 +6,11 @@ extern int pm_trace_enabled; +static inline int pm_trace_is_enabled(void) +{ + return pm_trace_enabled; +} + struct device; extern void set_trace_device(struct device *); extern void generate_resume_trace(const void *tracedata, unsigned int user); @@ -17,6 +22,8 @@ extern void generate_resume_trace(const #else +static inline int pm_trace_is_enabled(void) { return 0; } + #define TRACE_DEVICE(dev) do { } while (0) #define TRACE_RESUME(dev) do { } while (0) Index: linux-2.6/include/linux/device.h =================================================================== --- linux-2.6.orig/include/linux/device.h +++ linux-2.6/include/linux/device.h @@ -472,6 +472,12 @@ static inline int device_is_registered(s return dev->kobj.state_in_sysfs; } +static inline void device_enable_async_suspend(struct device *dev, bool enable) +{ + if (dev->power.status == DPM_ON) + dev->power.async_suspend = enable; +} + void driver_init(void); /* ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-10 21:14 ` Rafael J. Wysocki @ 2009-12-10 22:17 ` Alan Stern 2009-12-10 23:45 ` Rafael J. Wysocki 0 siblings, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-10 22:17 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Thu, 10 Dec 2009, Rafael J. Wysocki wrote: > > You should see how badly lockdep complains about the rwsems. If it > > really doesn't like them then using completions makes sense. > > It does complain about them, but when the nested _down operations are marked > as nested, it stops complaining (that's in the version where there's no async > in the _noirq phases). Did you set the async_suspend flag for any devices during the test? And did you run more than one suspend/resume cycle? > +extern int __dpm_wait(struct device *dev, void *ign); > + > +static inline void dpm_wait(struct device *dev) > +{ > + __dpm_wait(dev, NULL); > +} Sorry, I intended to mention this before but forgot. This design is inelegant. You shouldn't have inlines calling functions with extra unused arguments; they just waste code space. Make dpm_wait() be a real routine and add a shim to the device_for_each_child() loop. > @@ -366,7 +388,7 @@ void dpm_resume_noirq(pm_message_t state > > mutex_lock(&dpm_list_mtx); > transition_started = false; > - list_for_each_entry(dev, &dpm_list, power.entry) > + list_for_each_entry(dev, &dpm_list, power.entry) { > if (dev->power.status > DPM_OFF) { > int error; > > @@ -375,23 +397,27 @@ void dpm_resume_noirq(pm_message_t state > if (error) > pm_dev_err(dev, state, " early", error); > } > + /* Needed by the subsequent dpm_resume(). */ > + INIT_COMPLETION(dev->power.completion); You're still doing it. Don't initialize the completions in a totally different phase! Initialize them directly before they are used. Namely, at the start of device_resume() and device_suspend(). One more thing. A logical time to check for errors is just after waiting for the children in __device_suspend(), instead of beforehand in async_suspend(). After all, if an error occurs then it's likely to happen while we are waiting. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) 2009-12-10 22:17 ` Alan Stern @ 2009-12-10 23:45 ` Rafael J. Wysocki 0 siblings, 0 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-10 23:45 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Zhang Rui, LKML, ACPI Devel Maling List, pm list On Thursday 10 December 2009, Alan Stern wrote: > On Thu, 10 Dec 2009, Rafael J. Wysocki wrote: > > > > You should see how badly lockdep complains about the rwsems. If it > > > really doesn't like them then using completions makes sense. > > > > It does complain about them, but when the nested _down operations are marked > > as nested, it stops complaining (that's in the version where there's no async > > in the _noirq phases). > > Did you set the async_suspend flag for any devices during the test? Yes. All ACPI, all PCI, all serio, as usual. ;-) > And did you run more than one suspend/resume cycle? Sure. Actually, I test it in the /sys/power/pm_test = core mode, but that shouldn't really matter. > > +extern int __dpm_wait(struct device *dev, void *ign); > > + > > +static inline void dpm_wait(struct device *dev) > > +{ > > + __dpm_wait(dev, NULL); > > +} > > Sorry, I intended to mention this before but forgot. This design is > inelegant. You shouldn't have inlines calling functions with extra > unused arguments; they just waste code space. Make dpm_wait() be a > real routine and add a shim to the device_for_each_child() loop. I thought about that myself, done now. > > @@ -366,7 +388,7 @@ void dpm_resume_noirq(pm_message_t state > > > > mutex_lock(&dpm_list_mtx); > > transition_started = false; > > - list_for_each_entry(dev, &dpm_list, power.entry) > > + list_for_each_entry(dev, &dpm_list, power.entry) { > > if (dev->power.status > DPM_OFF) { > > int error; > > > > @@ -375,23 +397,27 @@ void dpm_resume_noirq(pm_message_t state > > if (error) > > pm_dev_err(dev, state, " early", error); > > } > > + /* Needed by the subsequent dpm_resume(). */ > > + INIT_COMPLETION(dev->power.completion); > > You're still doing it. Don't initialize the completions in a totally > different phase! Initialize them directly before they are used. > Namely, at the start of device_resume() and device_suspend(). The idea was to initialize them all at the same time, before entering the phase in which they were used, but I came to the conclusion that this was not necessary, because the dpm_list ordering was such that the devices to be waited for would always have their completions reinitialized before starting __device_suspend() or __device_resume() for the waiting ones. > One more thing. A logical time to check for errors is just after > waiting for the children in __device_suspend(), instead of beforehand > in async_suspend(). After all, if an error occurs then it's likely to > happen while we are waiting. Good idea, done. Updated patch is appended. Rafael --- drivers/base/power/main.c | 106 ++++++++++++++++++++++++++++++++++++++++--- include/linux/device.h | 6 ++ include/linux/pm.h | 7 ++ include/linux/resume-trace.h | 7 ++ 4 files changed, 121 insertions(+), 5 deletions(-) Index: linux-2.6/include/linux/pm.h =================================================================== --- linux-2.6.orig/include/linux/pm.h +++ linux-2.6/include/linux/pm.h @@ -26,6 +26,7 @@ #include <linux/spinlock.h> #include <linux/wait.h> #include <linux/timer.h> +#include <linux/completion.h> /* * Callbacks for platform drivers to implement. @@ -412,9 +413,11 @@ struct dev_pm_info { pm_message_t power_state; unsigned int can_wakeup:1; unsigned int should_wakeup:1; + unsigned async_suspend:1; enum dpm_state status; /* Owned by the PM core */ #ifdef CONFIG_PM_SLEEP struct list_head entry; + struct completion completion; #endif #ifdef CONFIG_PM_RUNTIME struct timer_list suspend_timer; @@ -508,6 +511,8 @@ extern void __suspend_report_result(cons __suspend_report_result(__func__, fn, ret); \ } while (0) +extern void dpm_wait(struct device *dev); + #else /* !CONFIG_PM_SLEEP */ #define device_pm_lock() do {} while (0) @@ -520,6 +525,8 @@ static inline int dpm_suspend_start(pm_m #define suspend_report_result(fn, ret) do {} while (0) +static inline void dpm_wait(struct device *dev) {} + #endif /* !CONFIG_PM_SLEEP */ /* How to reorder dpm_list after device_move() */ Index: linux-2.6/drivers/base/power/main.c =================================================================== --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -25,6 +25,7 @@ #include <linux/resume-trace.h> #include <linux/rwsem.h> #include <linux/interrupt.h> +#include <linux/async.h> #include "../base.h" #include "power.h" @@ -42,6 +43,7 @@ LIST_HEAD(dpm_list); static DEFINE_MUTEX(dpm_list_mtx); +static pm_message_t pm_transition; /* * Set once the preparation of devices for a PM transition has started, reset @@ -56,6 +58,7 @@ static bool transition_started; void device_pm_init(struct device *dev) { dev->power.status = DPM_ON; + init_completion(&dev->power.completion); pm_runtime_init(dev); } @@ -111,6 +114,7 @@ void device_pm_remove(struct device *dev pr_debug("PM: Removing info for %s:%s\n", dev->bus ? dev->bus->name : "No Bus", kobject_name(&dev->kobj)); + complete_all(&dev->power.completion); mutex_lock(&dpm_list_mtx); list_del_init(&dev->power.entry); mutex_unlock(&dpm_list_mtx); @@ -162,6 +166,28 @@ void device_pm_move_last(struct device * } /** + * dpm_wait - Wait for a PM operation to complete. + * @dev: Device to wait for. + */ +void dpm_wait(struct device *dev) +{ + if (dev) + wait_for_completion(&dev->power.completion); +} +EXPORT_SYMBOL_GPL(dpm_wait); + +static int dpm_wait_fn(struct device *dev, void *ignore) +{ + dpm_wait(dev); + return 0; +} + +static void dpm_wait_for_children(struct device *dev) +{ + device_for_each_child(dev, NULL, dpm_wait_fn); +} + +/** * pm_op - Execute the PM operation appropriate for given PM event. * @dev: Device to handle. * @ops: PM operations to choose from. @@ -381,17 +407,18 @@ void dpm_resume_noirq(pm_message_t state EXPORT_SYMBOL_GPL(dpm_resume_noirq); /** - * device_resume - Execute "resume" callbacks for given device. + * __device_resume - Execute "resume" callbacks for given device. * @dev: Device to handle. * @state: PM transition of the system being carried out. */ -static int device_resume(struct device *dev, pm_message_t state) +static int __device_resume(struct device *dev, pm_message_t state) { int error = 0; TRACE_DEVICE(dev); TRACE_RESUME(0); + dpm_wait(dev->parent); down(&dev->sem); if (dev->bus) { @@ -426,11 +453,34 @@ static int device_resume(struct device * } End: up(&dev->sem); + complete_all(&dev->power.completion); TRACE_RESUME(error); return error; } +static void async_resume(void *data, async_cookie_t cookie) +{ + struct device *dev = (struct device *)data; + int error; + + error = __device_resume(dev, pm_transition); + if (error) + pm_dev_err(dev, pm_transition, " async", error); + put_device(dev); +} + +static int device_resume(struct device *dev) +{ + if (dev->power.async_suspend && !pm_trace_is_enabled()) { + get_device(dev); + async_schedule(async_resume, dev); + return 0; + } + + return __device_resume(dev, pm_transition); +} + /** * dpm_resume - Execute "resume" callbacks for non-sysdev devices. * @state: PM transition of the system being carried out. @@ -444,6 +494,7 @@ static void dpm_resume(pm_message_t stat INIT_LIST_HEAD(&list); mutex_lock(&dpm_list_mtx); + pm_transition = state; while (!list_empty(&dpm_list)) { struct device *dev = to_device(dpm_list.next); @@ -451,10 +502,11 @@ static void dpm_resume(pm_message_t stat if (dev->power.status >= DPM_OFF) { int error; + INIT_COMPLETION(dev->power.completion); dev->power.status = DPM_RESUMING; mutex_unlock(&dpm_list_mtx); - error = device_resume(dev, state); + error = device_resume(dev); mutex_lock(&dpm_list_mtx); if (error) @@ -469,6 +521,7 @@ static void dpm_resume(pm_message_t stat } list_splice(&list, &dpm_list); mutex_unlock(&dpm_list_mtx); + async_synchronize_full(); } /** @@ -623,17 +676,23 @@ int dpm_suspend_noirq(pm_message_t state } EXPORT_SYMBOL_GPL(dpm_suspend_noirq); +static int async_error; + /** * device_suspend - Execute "suspend" callbacks for given device. * @dev: Device to handle. * @state: PM transition of the system being carried out. */ -static int device_suspend(struct device *dev, pm_message_t state) +static int __device_suspend(struct device *dev, pm_message_t state) { int error = 0; + dpm_wait_for_children(dev); down(&dev->sem); + if (async_error) + goto End; + if (dev->class) { if (dev->class->pm) { pm_dev_dbg(dev, state, "class "); @@ -666,12 +725,42 @@ static int device_suspend(struct device suspend_report_result(dev->bus->suspend, error); } } + + if (!error) + dev->power.status = DPM_OFF; + End: up(&dev->sem); + complete_all(&dev->power.completion); return error; } +static void async_suspend(void *data, async_cookie_t cookie) +{ + struct device *dev = (struct device *)data; + int error; + + error = __device_suspend(dev, pm_transition); + if (error) { + pm_dev_err(dev, pm_transition, " async", error); + async_error = error; + } + + put_device(dev); +} + +static int device_suspend(struct device *dev, pm_message_t state) +{ + if (dev->power.async_suspend) { + get_device(dev); + async_schedule(async_suspend, dev); + return 0; + } + + return __device_suspend(dev, pm_transition); +} + /** * dpm_suspend - Execute "suspend" callbacks for all non-sysdev devices. * @state: PM transition of the system being carried out. @@ -683,10 +772,12 @@ static int dpm_suspend(pm_message_t stat INIT_LIST_HEAD(&list); mutex_lock(&dpm_list_mtx); + pm_transition = state; while (!list_empty(&dpm_list)) { struct device *dev = to_device(dpm_list.prev); get_device(dev); + INIT_COMPLETION(dev->power.completion); mutex_unlock(&dpm_list_mtx); error = device_suspend(dev, state); @@ -697,13 +788,17 @@ static int dpm_suspend(pm_message_t stat put_device(dev); break; } - dev->power.status = DPM_OFF; if (!list_empty(&dev->power.entry)) list_move(&dev->power.entry, &list); put_device(dev); + if (async_error) + break; } list_splice(&list, dpm_list.prev); mutex_unlock(&dpm_list_mtx); + async_synchronize_full(); + if (!error) + error = async_error; return error; } @@ -762,6 +857,7 @@ static int dpm_prepare(pm_message_t stat INIT_LIST_HEAD(&list); mutex_lock(&dpm_list_mtx); transition_started = true; + async_error = 0; while (!list_empty(&dpm_list)) { struct device *dev = to_device(dpm_list.next); Index: linux-2.6/include/linux/resume-trace.h =================================================================== --- linux-2.6.orig/include/linux/resume-trace.h +++ linux-2.6/include/linux/resume-trace.h @@ -6,6 +6,11 @@ extern int pm_trace_enabled; +static inline int pm_trace_is_enabled(void) +{ + return pm_trace_enabled; +} + struct device; extern void set_trace_device(struct device *); extern void generate_resume_trace(const void *tracedata, unsigned int user); @@ -17,6 +22,8 @@ extern void generate_resume_trace(const #else +static inline int pm_trace_is_enabled(void) { return 0; } + #define TRACE_DEVICE(dev) do { } while (0) #define TRACE_RESUME(dev) do { } while (0) Index: linux-2.6/include/linux/device.h =================================================================== --- linux-2.6.orig/include/linux/device.h +++ linux-2.6/include/linux/device.h @@ -472,6 +472,12 @@ static inline int device_is_registered(s return dev->kobj.state_in_sysfs; } +static inline void device_enable_async_suspend(struct device *dev, bool enable) +{ + if (dev->power.status == DPM_ON) + dev->power.async_suspend = enable; +} + void driver_init(void); /* ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 5:57 ` Linus Torvalds ` (2 preceding siblings ...) 2009-12-07 15:13 ` Alan Stern @ 2009-12-07 15:15 ` Rafael J. Wysocki 2009-12-07 16:37 ` Linus Torvalds 3 siblings, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-07 15:15 UTC (permalink / raw) To: Linus Torvalds Cc: Zhang Rui, Alan Stern, LKML, ACPI Devel Maling List, pm list On Monday 07 December 2009, Linus Torvalds wrote: > > On Mon, 7 Dec 2009, Zhang Rui wrote: > > > > Hi, Linus, > > can you please look at this patch set and see if the idea is right? > > http://marc.info/?l=linux-kernel&m=124840449826386&w=2 > > http://marc.info/?l=linux-acpi&m=124840456826456&w=2 > > http://marc.info/?l=linux-acpi&m=124840456926459&w=2 > > http://marc.info/?l=linux-acpi&m=124840457026468&w=2 > > http://marc.info/?l=linux-acpi&m=124840457126471&w=2 > > So I'm not entirely sure about that patch-set, but the thing I like about > it is how drivers really sign up to it one by one, rather than having all > PCI devices automatically signed up for async behavior. > > That said, the thing I don't like about it is some of the same thing I > don't necessarily like about the series in Rafael's tree either: Just for the record, it's not in there any more. > it looks rather over-designed with the whole infrastructure for async device logic > (your patch in http://marc.info/?l=linux-acpi&m=124840456926459&w=2). How > would you explain that whole async_dev_register() logic in simple terms to > somebody else? > > (I think yours is simpler that the one in the PM tree, but I dunno. I've > not really compared the two). > > So let me explain my dislike by trying to outline some conceptually simple > thing that doesn't have any call-backs, doesn't have any "classes", > doesn't require registration etc. It just allows drivers at any level to > decide to do some things (not necessarily everything) asynchronously. > > Here's the outline: > > - first off: drivers that don't know that they nest clearly don't do > anything asynchronous. No "PCI devices can be done in parallel" crap, > because they really can't - not in the general case. So just forget > about that kind of logic entirely: it's just wrong. > > - the 'suspend' thing is a depth-first tree walk. As we suspend a node, > we first suspend the child nodes, and then we suspend the node itself. > Everybody agrees about that, right? > > - Trivial "async rule": the tree is walked synchronously, but as we walk > it, any point in the tree may decide to do some or all of its suspend > asynchronously. For example, when we hit a disk node, the disk driver > may just decide that (a) it knows that the disk is an independent thing > and (b) it's hierarchical wrt it's parent so (c) it can do the disk > suspend asynchronously. > > - To protect against a parent node being suspended before any async child > work has completed, the child suspend - before it kicks off the actual > async work - just needs to take a read-lock on the parent (read-lock, > because you may have multiple children sharing a parent, and they don't > lock each other out). Then the only thing the asynchronous code needs > to do is to release the read lock when it is done. > > - Now, the rule just becomes that the parent has to take a write lock on > itself when it suspends itself. That will automatically block until > all children are done. > > Doesn't the above sound _simple_? I don't think the idea is really that much simpler than the one behind the patchset you've just rejected. The only real difference is that in that patchset the entire suspend and resume callbacks could be either asynchronous or synchronous and in your approach each callback may be devided into the synchronous and asynchronous part, which admittedly is more flexible, but not necessarily simpler. Now, apart from the idea there are some details that need to be taken into consideration like the fact that the children may not be the only devices you need to wait for with the parent suspend and that implies additional locking rules. But you need to know which devices to lock and that has to be represented somehow (the PM links in my patchset were for this and _nothing_ else). Also, it looks like the parent locking should rather be done at the core level, as it appears to be a piece of code that needs to be called for each device: if (I_have_children || I_have_other_dependent_devices) write_lock_myself > Now, the problem remains that when you walk the device tree starting off > all these potentially asynchronous events, you don't want to do that > serialization part (the "parent suspend") as you walk the tree - because > then you would only ever do one single level asynchronously. Which is why > I suggested splitting the suspend into a "pre-suspend" phase (and a > "post-resume" one). Because then the tree walk goes from > > # single depth-first thing > suspend(root) > { > for_each_child(root) { > // This may take the parent lock for > // reading if it does something async > suspend(child); > } > > // This serializes with any async children > write_lock(root->lock); > suspend_one_node(root); > write_unlock(root->lock); > } > > to > > # Phase one: walk the tree synchronously, starting any > # async work on the leaves > suspend_prepare(root) > { > for_each_child(root) { > // This may take the parent lock for > // reading if it does something async > suspend_prepare(child); > } > suspend_prepare_one_node(root); > } > > # Phase two: walk the tree synchronously, waiting for > # and finishing the suspend > suspend(root) > { > for_each_child(root) { > suspend(child); > } > // This serializes with any async children started in phase 1 > write_lock(root->lock); > suspend_one_node(root); > write_unlock(root->lock); > } > > and I really think this should work. We already have prepare and complete suspend callbacks, for a different reason, and I'm not sure they're suitable for doing the async thing. So, we'd need to add another two callbacks, just for suspend to RAM, and what about hibernation? Isn't that going to become a bit too complicated? > The advantage: untouched drivers don't change ANY SEMANTICS AT ALL. This also was true for my patchset. > If they don't have a 'suspend_prepare()' function, then they still see that > exact same sequence of 'suspend()' calls. The same holded for drivers without the async_suspend flag set in my patchset (I really should have left setting it to individual drivers). > In fact, even if they have children that _do_ have drivers that have that > async phase, they'll never know, because that simple write-semaphore > trivially guarantees that whether there was async work or not, it will be > completed by the time we call 'suspend()'. Ditto. > And drivers that want to do things asynchronously don't need to register > or worry: all they do is literally > > - move their 'suspend()' function to 'suspend_prepare()' instead > > - add a > > down_read(dev->parent->lock); > async_run(mysuspend, dev); > > to the point that they want to be asynchronous (which may be _all_ of > it or just some slow part). The 'mysuspend' part would be the async > part. > > - add a > > up_read(dev->parent->lock); > > to the end of their asynchronous 'mysuspend()' function, so that when > the child has finished suspending, the parent down_write() will finally > succeed. In my patchset the drivers didn't need to do all that stuff. The only thing they needed, if they wanted their suspend/resume to be executed asynchronously, was to set the async_suspend flag. But this is just for the record, in case you end up with code that's more complicated than the rejected one. Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 15:15 ` [GIT PULL] PM updates for 2.6.33 Rafael J. Wysocki @ 2009-12-07 16:37 ` Linus Torvalds 2009-12-07 20:47 ` Rafael J. Wysocki 0 siblings, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-07 16:37 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Zhang Rui, Alan Stern, LKML, ACPI Devel Maling List, pm list On Mon, 7 Dec 2009, Rafael J. Wysocki wrote: > > > The advantage: untouched drivers don't change ANY SEMANTICS AT ALL. > > This also was true for my patchset. That's simply not trye. You set async_suspend on every single PCI driver. I object very heavily to it. You also introduce this whole big "callback when ready", and "non-topoligical PM dependency chain" thing. Which I also object to. Notice how with the simpler "lock parent" model, you _can_ actually encode non-topological dependencies, but you do it by simply read-locking whatever other independent device you want. So if an architecture has some system devices that have odd rules, that architecture can simply encode those rules in its suspend() functions. It doesn't need to expose it to the device layer - because the device layer won't even care. The code will just automatically "do the right thing" without even having that notion of PM dependencies at any other level than the driver that knows about it. No registration, no callbacks, no nothing. > In my patchset the drivers didn't need to do all that stuff. The only thing > they needed, if they wanted their suspend/resume to be executed > asynchronously, was to set the async_suspend flag. In my patchset, the drivers don't need to either. The _only_ thing that would do this is something like the USB layer. We're talking ten lines of code or so. And you get rid of all the PM dependencies and all the infrastructure - because the model is so simple that it doesn't need any. (Well, except for the infrastructure to run things asynchronously, but that was kind of my point from the very beginning: we can just re-use all that existing async infrastructure. We already have that). Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 16:37 ` Linus Torvalds @ 2009-12-07 20:47 ` Rafael J. Wysocki 2009-12-07 20:56 ` Linus Torvalds 0 siblings, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-07 20:47 UTC (permalink / raw) To: Linus Torvalds Cc: Zhang Rui, Alan Stern, LKML, ACPI Devel Maling List, pm list On Monday 07 December 2009, Linus Torvalds wrote: > > On Mon, 7 Dec 2009, Rafael J. Wysocki wrote: > > > > > The advantage: untouched drivers don't change ANY SEMANTICS AT ALL. > > > > This also was true for my patchset. > > That's simply not trye. > > You set async_suspend on every single PCI driver. I object very heavily to > it. That was a mistake, I admit. However, it was done in a separate patch that (1) was not necessary and (2) shouldn't have been there. Sorry for making the mistake of including that into the patchset. So I understand your objection to that and let's not get back to this again, ok? > You also introduce this whole big "callback when ready", and > "non-topoligical PM dependency chain" thing. Which I also object to. These things are also non-essential. Acutally they wasn't there in the initial version of my patches and were added after people had complained that it had not been parallel enough and hadn't take the off-tree dependecies into account. I could remove these things either and quite easily. > Notice how with the simpler "lock parent" model, you _can_ actually encode > non-topological dependencies, but you do it by simply read-locking > whatever other independent device you want. So if an architecture has some > system devices that have odd rules, that architecture can simply encode > those rules in its suspend() functions. I'm not arguing against that. In fact, my only worry were that additional suspend/resume callbacks I really wouldn't like to introduce. But since you've found a way of doing things without them, I'm totally fine with this approach. > It doesn't need to expose it to the device layer - because the device > layer won't even care. The code will just automatically "do the right > thing" without even having that notion of PM dependencies at any other > level than the driver that knows about it. > > No registration, no callbacks, no nothing. > > > In my patchset the drivers didn't need to do all that stuff. The only thing > > they needed, if they wanted their suspend/resume to be executed > > asynchronously, was to set the async_suspend flag. > > In my patchset, the drivers don't need to either. > > The _only_ thing that would do this is something like the USB layer. We're > talking ten lines of code or so. And you get rid of all the PM > dependencies and all the infrastructure - because the model is so simple > that it doesn't need any. It just uses a different way of representing these things, perhaps more efficiently. > (Well, except for the infrastructure to run things asynchronously, but > that was kind of my point from the very beginning: we can just re-use all > that existing async infrastructure. We already have that). So I guess the only thing we need at the core level is to call async_synchronize_full() after every stage of suspend/resume, right? Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 20:47 ` Rafael J. Wysocki @ 2009-12-07 20:56 ` Linus Torvalds 0 siblings, 0 replies; 235+ messages in thread From: Linus Torvalds @ 2009-12-07 20:56 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Zhang Rui, Alan Stern, LKML, ACPI Devel Maling List, pm list On Mon, 7 Dec 2009, Rafael J. Wysocki wrote: > > So I guess the only thing we need at the core level is to call > async_synchronize_full() after every stage of suspend/resume, right? Yes and no. Yes in the sense that _if_ everybody always uses "async_schedule()" (or whatever the call is named - I've really only written pseudo-code and haven't even tried to look up the details), then the only thing you need to do is async_synchronize_full(). But one of the nice things about using just the trivial rwlock model and letting any async users just depend on that is that we could easily just depend entirely on those device locks, and allow drivers to do async shutdowns other ways too. For example, I could imagine some driver just doing an async suspend (or resume) that gets completed in an interrupt context, rather than being done by 'async_schedule()' at all. So in many ways it's nicer to serialize by just doing serialize_all_PM_events() { for_each_device() { down_write(dev->lock); up_write(dev->lock); } } rather than depend on something like async_synchronize_full() that obviously waits for all async events, but doesn't have the capability to wait for any other event that some random driver might be using. [ That "down+up" is kind of stupid, but I don't think we have a "wait for unlocked" rwsem operation. We could add one, and it would be cheaper for the case where the device never did anything async at all, and didn't really need to dirty that cacheline by doing that write lock/unlock pair. ] But that really isn't a big deal. I think it would be perfectly ok to also just say "if you do any async PM, you need to use 'async_schedule()' because that's all we're going to wait for". It's probably perfectly fine. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-06 15:23 ` Alan Stern 2009-12-06 19:04 ` [linux-pm] " Victor Lowther 2009-12-07 3:57 ` Zhang Rui @ 2009-12-07 5:20 ` Linus Torvalds 2009-12-07 15:42 ` Alan Stern 2 siblings, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-07 5:20 UTC (permalink / raw) To: Alan Stern; +Cc: Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list On Sun, 6 Dec 2009, Alan Stern wrote: > > That's ridiculous. Having gone to all the trouble of building a device > tree, one which is presumably still almost entirely correct, why go to > all the trouble of tearing it down only to rebuild it again? (Note: > I'm talking about resume-from-RAM here, not resume-from-hibernation.) Hey, I can believe that it's worth keeping the USB device tree, and just validating it instead. However: > If I understand correctly, what you're suggesting is impractical. You > would have each driver responsible for resuming the devices it > registers. The thing is, for 99% of all devices, we really _really_ don't care. Especially PCI devices. Your average laptop will have something like ten PCI devices on it, and 99% of those have no delays at all outside of the millisecond-level ones that it takes for power management register writes etc to take place. So what I'm suggesting is to NOT DO ANY ASYNC RESUME AT ALL by default. Because async device management is _hard_, and results in various nasty ordering problems that are timing-dependent etc. And it's totally pointless for almost all cases. This is why I think it is so crazy to try to create those idiotic "this device depends on that other" lists etc - it's adding serious conceptual complexity for something that nobody cares about, and that just allows for non-deterministic behavior that we don't even want. > So consider this suggestion: Let's define PM groups. Let's not. I can imagine that doing USB resume specially is worth it, since USB is fundamentally a pretty slow bus. But USB is also a fairly clear hierarchy, so there is no point in PM groups or any other information outside of the pure topology. But there is absolutely zero point in doing that for devices in general. PCI drivers simply do not want concurrent initialization. The upsides are basically zero (win a few msecs?) and the downsides are the pointless complexity. We don't do PCI discovery asyncronously either - for all the same reasons. Now, a PCI driver may then implement a bus that is slow (ie SCSI, ATA, USB), and that bus may itself then want to do something else. If it really is a good idea to add the whole hierarchical model to USB suspend/resume, I can live with that, but that is absolutely no excuse for then doing it for cases where the hierarchy is (a) known to be broken (ie the whole PCI multifunction thing, but also things like motherboard power management devices) and (b) don't have the same kind of slow bus issues. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 5:20 ` Linus Torvalds @ 2009-12-07 15:42 ` Alan Stern 0 siblings, 0 replies; 235+ messages in thread From: Alan Stern @ 2009-12-07 15:42 UTC (permalink / raw) To: Linus Torvalds; +Cc: Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list On Sun, 6 Dec 2009, Linus Torvalds wrote: > I can imagine that doing USB resume specially is worth it, since USB is > fundamentally a pretty slow bus. But USB is also a fairly clear hierarchy, > so there is no point in PM groups or any other information outside of the > pure topology. > > But there is absolutely zero point in doing that for devices in general. > PCI drivers simply do not want concurrent initialization. The upsides are > basically zero (win a few msecs?) and the downsides are the pointless > complexity. We don't do PCI discovery asyncronously either - for all the > same reasons. > > Now, a PCI driver may then implement a bus that is slow (ie SCSI, ATA, > USB), and that bus may itself then want to do something else. If it really > is a good idea to add the whole hierarchical model to USB suspend/resume, > I can live with that, but that is absolutely no excuse for then doing it > for cases where the hierarchy is (a) known to be broken (ie the whole PCI > multifunction thing, but also things like motherboard power management > devices) and (b) don't have the same kind of slow bus issues. Okay. I can understand not wanting to burden everybody else with USB's weaknesses. Simply doing an async suspend & resume of each USB root hub might be enough to give a significant advantage. For the most part these root hubs tend to be registered sequentially with few or no other devices in between.[*] Hence the "stalls" that would occur when suspending a parent or resuming a child wouldn't slow things down very much. We would not always reap the maximum advantage of a fully-asyncronous approach but there would be some improvement. This is sort of what Arjan suggested yesterday. Its benefit is that nothing outside usbcore has to change. Alan Stern [*] In fact this is true only on systems where the USB host controller drivers are built as modules. If everything is compiled into the kernel then the devices are registered in the worst possible order: controller 1, root hub 1, controller 2, root hub 2, ... I suppose the root hubs could be registered in a delayed work routine. It would be a little awkward but it would solve this issue. ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-06 2:05 ` Linus Torvalds 2009-12-06 2:36 ` Rafael J. Wysocki 2009-12-06 15:23 ` Alan Stern @ 2009-12-06 19:35 ` Arjan van de Ven 2009-12-06 19:58 ` Linus Torvalds 2009-12-06 20:36 ` Alan Stern 2 siblings, 2 replies; 235+ messages in thread From: Arjan van de Ven @ 2009-12-06 19:35 UTC (permalink / raw) To: Linus Torvalds Cc: Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list, Alan Stern [-- Attachment #1: Type: text/plain, Size: 1938 bytes --] On Sat, 5 Dec 2009 18:05:14 -0800 (PST) Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Sun, 6 Dec 2009, Rafael J. Wysocki wrote: > > > > While the current settings are probably unsafe (like enabling PCI > > devices to be suspended asynchronously by default if there are not > > any direct dependences between them), there are provisions to make > > eveything safe, if we have enough information (which also is needed > > to put the required logic into the drivers). > > I disagree. > > Think of a situation that we already handle pretty poorly: USB mass > storage devices over a suspend/resume. > > > The device tree represents a good deal of the dependences > > between devices and the other dependences may be represented as PM > > links enforcing specific ordering of the PM callbacks. > > The device tree means nothing at all, because it may need to be > entirely rebuilt at resume time. btw I instrumented both the suspend and resume, and made graphs out of it for my laptop (modern laptop with Intel cpu/wifi/graphics of course). http://www.fenrus.org/graphs/suspend.svg http://www.fenrus.org/graphs/resume.svg (also attached for convenience) the resume clearly shows that all this talking about PCI stuff is completely without practical merit.. it's the USB stuff where the time is spent. in suspend, there's a PCI device (:1b) that does take some time, which is the audio controller. The bulk of the time is in the serio driver though.. As an "interested bystander" to this thread.... sounds like Linus' arguments have merit, and that solving the USB resume to go async in some form will fix pretty much all we want solving... [and that at least we need to do this stuff data/measurement driven, and not just based on how we THINK things work] -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org [-- Attachment #2: suspend.svgz --] [-- Type: image/svg+xml-compressed, Size: 2733 bytes --] [-- Attachment #3: resume.svgz --] [-- Type: image/svg+xml-compressed, Size: 2852 bytes --] ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-06 19:35 ` Arjan van de Ven @ 2009-12-06 19:58 ` Linus Torvalds 2009-12-06 20:18 ` Arjan van de Ven 2009-12-06 20:36 ` Alan Stern 1 sibling, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-06 19:58 UTC (permalink / raw) To: Arjan van de Ven Cc: Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list, Alan Stern On Sun, 6 Dec 2009, Arjan van de Ven wrote: > > in suspend, there's a PCI device (:1b) that does take some time, which > is the audio controller. The bulk of the time is in the serio driver > though.. That serio thing is disgusting. We had serious problems with the serial driver timeouts for boot-time optimizations too, didn't we? I assume that you don't even _use_ that serial port, do you? Or is it open for serial console logging or something? If it isn't even open, we shouldn't waste any time on the hardware. Your graph seems to say that serio1 shutdown is roughly from 29.40 to 29.85, ie almost half a second. That's just bogus. I don't see where it comes from, though. It looks like we have - pciserial_suspend_ports/serial_pnp_suspend -> serial8250_suspend_port -> uart_suspend_port -> (wait for tx_empty, but only for ASYNC_INITIALIZED, which shouldn't be true if it's closed, and should be limited to 30ms) uart_change_pm -> serial8250_pm and none of them look like they should take anywhere close to half a second. So I'm obviously missing something, and your chart didn't include the sleep/wakeup pairs. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-06 19:58 ` Linus Torvalds @ 2009-12-06 20:18 ` Arjan van de Ven 2009-12-06 21:08 ` Linus Torvalds 2009-12-06 22:54 ` Dmitry Torokhov 0 siblings, 2 replies; 235+ messages in thread From: Arjan van de Ven @ 2009-12-06 20:18 UTC (permalink / raw) To: Linus Torvalds Cc: Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list, Alan Stern On Sun, 6 Dec 2009 11:58:44 -0800 (PST) Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Sun, 6 Dec 2009, Arjan van de Ven wrote: > > > > in suspend, there's a PCI device (:1b) that does take some time, > > which is the audio controller. The bulk of the time is in the serio > > driver though.. > > That serio thing is disgusting. We had serious problems with the > serial driver timeouts for boot-time optimizations too, didn't we? isn't serio the PS/2 stuff? (serio was an issue during boot as well due to some interesting rcu delays btw) > and none of them look like they should take anywhere close to half a > second. So I'm obviously missing something, and your chart didn't > include the sleep/wakeup pairs. what do you mean by this? what would you like to see ? (I have a separate graph for resume.. but the graphing program does not show those things that take so short to resume that the font to print the name would be less than a pixel; can fix that) [fwiw I care more about resume speed than suspend speed, but obviously am happy if both get fixed... just resume tends to be much more user interesting, just like getting out of the idle loop matters more than getting into it] -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-06 20:18 ` Arjan van de Ven @ 2009-12-06 21:08 ` Linus Torvalds 2009-12-06 22:54 ` Dmitry Torokhov 1 sibling, 0 replies; 235+ messages in thread From: Linus Torvalds @ 2009-12-06 21:08 UTC (permalink / raw) To: Arjan van de Ven Cc: Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list, Alan Stern On Sun, 6 Dec 2009, Arjan van de Ven wrote: > > That serio thing is disgusting. We had serious problems with the > > serial driver timeouts for boot-time optimizations too, didn't we? > > isn't serio the PS/2 stuff? Oh, you're right, I just assumed it was regular serial. So it's the keyboard and mouse. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-06 20:18 ` Arjan van de Ven 2009-12-06 21:08 ` Linus Torvalds @ 2009-12-06 22:54 ` Dmitry Torokhov 2009-12-07 0:55 ` Arjan van de Ven 2009-12-07 1:18 ` Arjan van de Ven 1 sibling, 2 replies; 235+ messages in thread From: Dmitry Torokhov @ 2009-12-06 22:54 UTC (permalink / raw) To: Arjan van de Ven Cc: Linus Torvalds, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list, Alan Stern On Dec 6, 2009, at 12:18 PM, Arjan van de Ven <arjan@infradead.org> wrote: > On Sun, 6 Dec 2009 11:58:44 -0800 (PST) > Linus Torvalds <torvalds@linux-foundation.org> wrote: > >> >> >> On Sun, 6 Dec 2009, Arjan van de Ven wrote: >>> >>> in suspend, there's a PCI device (:1b) that does take some time, >>> which is the audio controller. The bulk of the time is in the serio >>> driver though.. >> >> That serio thing is disgusting. We had serious problems with the >> serial driver timeouts for boot-time optimizations too, didn't we? > > isn't serio the PS/2 stuff? Yes, that's your PS/2 mouse (rather touchpad) and the delay comes from device reset (needed by some keyboard controllers - I remember HP -or it and keyboard will be dead at resume). -- Dmitry ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-06 22:54 ` Dmitry Torokhov @ 2009-12-07 0:55 ` Arjan van de Ven 2009-12-07 2:27 ` Dmitry Torokhov 2009-12-07 1:18 ` Arjan van de Ven 1 sibling, 1 reply; 235+ messages in thread From: Arjan van de Ven @ 2009-12-07 0:55 UTC (permalink / raw) To: Dmitry Torokhov Cc: Linus Torvalds, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list, Alan Stern On Sun, 6 Dec 2009 14:54:48 -0800 Dmitry Torokhov <dmitry.torokhov@gmail.com> wrote: > > isn't serio the PS/2 stuff? > > Yes, that's your PS/2 mouse (rather touchpad) and the delay comes > from device reset (needed by some keyboard controllers - I remember > HP -or it and keyboard will be dead at resume). and I have a HP laptop... so this makes perfect sense. Thanks for the explenation! Now, the good news is that serio is near invisible on resume... -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 0:55 ` Arjan van de Ven @ 2009-12-07 2:27 ` Dmitry Torokhov 2009-12-07 5:26 ` Arjan van de Ven 0 siblings, 1 reply; 235+ messages in thread From: Dmitry Torokhov @ 2009-12-07 2:27 UTC (permalink / raw) To: Arjan van de Ven Cc: Linus Torvalds, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list, Alan Stern On Sun, Dec 06, 2009 at 04:55:51PM -0800, Arjan van de Ven wrote: > On Sun, 6 Dec 2009 14:54:48 -0800 > Dmitry Torokhov <dmitry.torokhov@gmail.com> wrote: > > > > isn't serio the PS/2 stuff? > > > > Yes, that's your PS/2 mouse (rather touchpad) and the delay comes > > from device reset (needed by some keyboard controllers - I remember > > HP -or it and keyboard will be dead at resume). > > and I have a HP laptop... so this makes perfect sense. > Thanks for the explenation! > Well, we do it for everyone, it's just a particular series of HPs forced us to add it. > Now, the good news is that serio is near invisible on resume... > Resume is fully offloaded to kseriod. -- Dmitry ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 2:27 ` Dmitry Torokhov @ 2009-12-07 5:26 ` Arjan van de Ven 2009-12-07 6:00 ` Dmitry Torokhov 0 siblings, 1 reply; 235+ messages in thread From: Arjan van de Ven @ 2009-12-07 5:26 UTC (permalink / raw) To: Dmitry Torokhov Cc: Linus Torvalds, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list, Alan Stern On Sun, 6 Dec 2009 18:27:56 -0800 Dmitry Torokhov <dmitry.torokhov@gmail.com> wrote: > On Sun, Dec 06, 2009 at 04:55:51PM -0800, Arjan van de Ven wrote: > > On Sun, 6 Dec 2009 14:54:48 -0800 > > Dmitry Torokhov <dmitry.torokhov@gmail.com> wrote: > > > > > > isn't serio the PS/2 stuff? > > > > > > Yes, that's your PS/2 mouse (rather touchpad) and the delay comes > > > from device reset (needed by some keyboard controllers - I > > > remember HP -or it and keyboard will be dead at resume). > > > > and I have a HP laptop... so this makes perfect sense. > > Thanks for the explenation! > > > > Well, we do it for everyone, it's just a particular series of HPs > forced us to add it. > wonder if it should be a DMI based quirk instead... -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 5:26 ` Arjan van de Ven @ 2009-12-07 6:00 ` Dmitry Torokhov 2009-12-21 9:01 ` Pavel Machek 0 siblings, 1 reply; 235+ messages in thread From: Dmitry Torokhov @ 2009-12-07 6:00 UTC (permalink / raw) To: Arjan van de Ven Cc: Linus Torvalds, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list, Alan Stern On Sun, Dec 06, 2009 at 09:26:00PM -0800, Arjan van de Ven wrote: > On Sun, 6 Dec 2009 18:27:56 -0800 > Dmitry Torokhov <dmitry.torokhov@gmail.com> wrote: > > > On Sun, Dec 06, 2009 at 04:55:51PM -0800, Arjan van de Ven wrote: > > > On Sun, 6 Dec 2009 14:54:48 -0800 > > > Dmitry Torokhov <dmitry.torokhov@gmail.com> wrote: > > > > > > > > isn't serio the PS/2 stuff? > > > > > > > > Yes, that's your PS/2 mouse (rather touchpad) and the delay comes > > > > from device reset (needed by some keyboard controllers - I > > > > remember HP -or it and keyboard will be dead at resume). > > > > > > and I have a HP laptop... so this makes perfect sense. > > > Thanks for the explenation! > > > > > > > Well, we do it for everyone, it's just a particular series of HPs > > forced us to add it. > > > wonder if it should be a DMI based quirk instead... > I have not received reports where it causes harm or reduces functionality so I'd prefer having it by default and not try to race with manufacturers. -- Dmitry ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 6:00 ` Dmitry Torokhov @ 2009-12-21 9:01 ` Pavel Machek 0 siblings, 0 replies; 235+ messages in thread From: Pavel Machek @ 2009-12-21 9:01 UTC (permalink / raw) To: Dmitry Torokhov Cc: Arjan van de Ven, Linus Torvalds, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list, Alan Stern On Sun 2009-12-06 22:00:53, Dmitry Torokhov wrote: > On Sun, Dec 06, 2009 at 09:26:00PM -0800, Arjan van de Ven wrote: > > On Sun, 6 Dec 2009 18:27:56 -0800 > > Dmitry Torokhov <dmitry.torokhov@gmail.com> wrote: > > > > > On Sun, Dec 06, 2009 at 04:55:51PM -0800, Arjan van de Ven wrote: > > > > On Sun, 6 Dec 2009 14:54:48 -0800 > > > > Dmitry Torokhov <dmitry.torokhov@gmail.com> wrote: > > > > > > > > > > isn't serio the PS/2 stuff? > > > > > > > > > > Yes, that's your PS/2 mouse (rather touchpad) and the delay comes > > > > > from device reset (needed by some keyboard controllers - I > > > > > remember HP -or it and keyboard will be dead at resume). > > > > > > > > and I have a HP laptop... so this makes perfect sense. > > > > Thanks for the explenation! > > > > > > > > > > Well, we do it for everyone, it's just a particular series of HPs > > > forced us to add it. > > > > > wonder if it should be a DMI based quirk instead... > > > > I have not received reports where it causes harm or reduces > functionality so I'd prefer having it by default and not try to race > with manufacturers. Well, it slows down everyone... and people are actually testing with linux, so it makes this problem more common on new systems. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-06 22:54 ` Dmitry Torokhov 2009-12-07 0:55 ` Arjan van de Ven @ 2009-12-07 1:18 ` Arjan van de Ven 2009-12-07 2:27 ` Dmitry Torokhov 1 sibling, 1 reply; 235+ messages in thread From: Arjan van de Ven @ 2009-12-07 1:18 UTC (permalink / raw) To: Dmitry Torokhov Cc: Linus Torvalds, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list, Alan Stern On Sun, 6 Dec 2009 14:54:48 -0800 Dmitry Torokhov <dmitry.torokhov@gmail.com> wrote: > Yes, that's your PS/2 mouse (rather touchpad) and the delay comes > from device reset (needed by some keyboard controllers - I remember > HP -or it and keyboard will be dead at resume). > btw could we do this reset in an async function call (as long as we wait for it to complete before we pull the plug finally) ? -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 1:18 ` Arjan van de Ven @ 2009-12-07 2:27 ` Dmitry Torokhov 2009-12-07 5:31 ` Arjan van de Ven 0 siblings, 1 reply; 235+ messages in thread From: Dmitry Torokhov @ 2009-12-07 2:27 UTC (permalink / raw) To: Arjan van de Ven Cc: Linus Torvalds, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list, Alan Stern On Sun, Dec 06, 2009 at 05:18:56PM -0800, Arjan van de Ven wrote: > On Sun, 6 Dec 2009 14:54:48 -0800 > Dmitry Torokhov <dmitry.torokhov@gmail.com> wrote: > > > Yes, that's your PS/2 mouse (rather touchpad) and the delay comes > > from device reset (needed by some keyboard controllers - I remember > > HP -or it and keyboard will be dead at resume). > > > > btw could we do this reset in an async function call (as long as we > wait for it to complete before we pull the plug finally) ? It has to complete before we start shutting down i8042, so there are dependencies involved... -- Dmitry ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 2:27 ` Dmitry Torokhov @ 2009-12-07 5:31 ` Arjan van de Ven 2009-12-07 6:15 ` Dmitry Torokhov 0 siblings, 1 reply; 235+ messages in thread From: Arjan van de Ven @ 2009-12-07 5:31 UTC (permalink / raw) To: Dmitry Torokhov Cc: Linus Torvalds, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list, Alan Stern On Sun, 6 Dec 2009 18:27:07 -0800 Dmitry Torokhov <dmitry.torokhov@gmail.com> wrote: > On Sun, Dec 06, 2009 at 05:18:56PM -0800, Arjan van de Ven wrote: > > On Sun, 6 Dec 2009 14:54:48 -0800 > > Dmitry Torokhov <dmitry.torokhov@gmail.com> wrote: > > > > > Yes, that's your PS/2 mouse (rather touchpad) and the delay comes > > > from device reset (needed by some keyboard controllers - I > > > remember HP -or it and keyboard will be dead at resume). > > > > > > > btw could we do this reset in an async function call (as long as we > > wait for it to complete before we pull the plug finally) ? > > It has to complete before we start shutting down i8042, so there are > dependencies involved... async function calls have 2 methods for synchronization: * inside an async function, you can wait for all "earlier" async functions to complete (async_synchronize_cookie) * outside an async function, you can wait for all scheduled async functions to complete (async_synchronize_full) so there's two options to use the async code to cut down this time: 1) Make both the mouse, keyboard AND the i8042 suspend functions async, and in the i8042 function the code first synchronizes on all previous async work 2) only make the mouse and keyboard suspend async, and just wait for all async work in i8042 suspend I strongly prefer number 1, in terms of getting the best suspend speed. It means that all other suspend code can run in parallel to the whole serio/i8042 suspend. Option two is simpler, but the delay is in the normal, synchronous path, so other suspend code will not run in parallel. The good news is that neither is hard for someone familiar with the code... -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 5:31 ` Arjan van de Ven @ 2009-12-07 6:15 ` Dmitry Torokhov 2009-12-07 6:31 ` Arjan van de Ven 0 siblings, 1 reply; 235+ messages in thread From: Dmitry Torokhov @ 2009-12-07 6:15 UTC (permalink / raw) To: Arjan van de Ven Cc: Linus Torvalds, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list, Alan Stern On Sun, Dec 06, 2009 at 09:31:12PM -0800, Arjan van de Ven wrote: > On Sun, 6 Dec 2009 18:27:07 -0800 > Dmitry Torokhov <dmitry.torokhov@gmail.com> wrote: > > > On Sun, Dec 06, 2009 at 05:18:56PM -0800, Arjan van de Ven wrote: > > > On Sun, 6 Dec 2009 14:54:48 -0800 > > > Dmitry Torokhov <dmitry.torokhov@gmail.com> wrote: > > > > > > > Yes, that's your PS/2 mouse (rather touchpad) and the delay comes > > > > from device reset (needed by some keyboard controllers - I > > > > remember HP -or it and keyboard will be dead at resume). > > > > > > > > > > btw could we do this reset in an async function call (as long as we > > > wait for it to complete before we pull the plug finally) ? > > > > It has to complete before we start shutting down i8042, so there are > > dependencies involved... > > async function calls have 2 methods for synchronization: > > * inside an async function, you can wait for all "earlier" async > functions to complete (async_synchronize_cookie) > * outside an async function, you can wait for all scheduled async > functions to complete (async_synchronize_full) > > so there's two options to use the async code to cut down this time: > > 1) Make both the mouse, keyboard AND the i8042 suspend functions async, > and in the i8042 function the code first synchronizes on all previous > async work > 2) only make the mouse and keyboard suspend async, and just wait for all > async work in i8042 suspend > > I strongly prefer number 1, in terms of getting the best suspend speed. > It means that all other suspend code can run in parallel to the whole > serio/i8042 suspend. > Option two is simpler, but the delay is in the normal, synchronous path, > so other suspend code will not run in parallel. > > The good news is that neither is hard for someone familiar with the > code... > And the bad thing is that violates multiple layers in the kernel. Atkbd driver does not have to be using i8042; neither does psmouse. Althtough they do in 99% of the cases there are other controllers providing the i8042-style ports. Just grep for SERIO_8042 in drivers/input/serio. I do not want to hard-code the i8042-psmouse-atkbd dependency. -- Dmitry ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 6:15 ` Dmitry Torokhov @ 2009-12-07 6:31 ` Arjan van de Ven 2009-12-07 6:32 ` Dmitry Torokhov 0 siblings, 1 reply; 235+ messages in thread From: Arjan van de Ven @ 2009-12-07 6:31 UTC (permalink / raw) To: Dmitry Torokhov Cc: Linus Torvalds, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list, Alan Stern On Sun, 6 Dec 2009 22:15:49 -0800 Dmitry Torokhov <dmitry.torokhov@gmail.com> wrote: > And the bad thing is that violates multiple layers in the kernel. > Atkbd driver does not have to be using i8042; neither does psmouse. > Althtough they do in 99% of the cases there are other controllers > providing the i8042-style ports. Just grep for SERIO_8042 in > drivers/input/serio. > > I do not want to hard-code the i8042-psmouse-atkbd dependency. it's not a specific dependency. it's a "I know I'm critical, so everything before me needs to be done". that doesn't encode an actual relationship, it encodes a potential relationship... with a worst case behavior of ... what we do right now ;_) -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 6:31 ` Arjan van de Ven @ 2009-12-07 6:32 ` Dmitry Torokhov 2009-12-07 15:17 ` Rafael J. Wysocki 0 siblings, 1 reply; 235+ messages in thread From: Dmitry Torokhov @ 2009-12-07 6:32 UTC (permalink / raw) To: Arjan van de Ven Cc: Linus Torvalds, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list, Alan Stern On Sun, Dec 06, 2009 at 10:31:12PM -0800, Arjan van de Ven wrote: > On Sun, 6 Dec 2009 22:15:49 -0800 > Dmitry Torokhov <dmitry.torokhov@gmail.com> wrote: > > > And the bad thing is that violates multiple layers in the kernel. > > Atkbd driver does not have to be using i8042; neither does psmouse. > > Althtough they do in 99% of the cases there are other controllers > > providing the i8042-style ports. Just grep for SERIO_8042 in > > drivers/input/serio. > > > > I do not want to hard-code the i8042-psmouse-atkbd dependency. > > it's not a specific dependency. > > it's a "I know I'm critical, so everything before me needs to be done". > > that doesn't encode an actual relationship, it encodes a potential > relationship... with a worst case behavior of ... what we do right > now ;_) This is the case with every parent device, isn't it? It is important for its children. And wasn't Rafael patchset trying to address exactkly this? -- Dmitry ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-07 6:32 ` Dmitry Torokhov @ 2009-12-07 15:17 ` Rafael J. Wysocki 0 siblings, 0 replies; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-07 15:17 UTC (permalink / raw) To: Dmitry Torokhov Cc: Arjan van de Ven, Linus Torvalds, LKML, ACPI Devel Maling List, pm list, Alan Stern On Monday 07 December 2009, Dmitry Torokhov wrote: > On Sun, Dec 06, 2009 at 10:31:12PM -0800, Arjan van de Ven wrote: > > On Sun, 6 Dec 2009 22:15:49 -0800 > > Dmitry Torokhov <dmitry.torokhov@gmail.com> wrote: > > > > > And the bad thing is that violates multiple layers in the kernel. > > > Atkbd driver does not have to be using i8042; neither does psmouse. > > > Althtough they do in 99% of the cases there are other controllers > > > providing the i8042-style ports. Just grep for SERIO_8042 in > > > drivers/input/serio. > > > > > > I do not want to hard-code the i8042-psmouse-atkbd dependency. > > > > it's not a specific dependency. > > > > it's a "I know I'm critical, so everything before me needs to be done". > > > > that doesn't encode an actual relationship, it encodes a potential > > relationship... with a worst case behavior of ... what we do right > > now ;_) > > This is the case with every parent device, isn't it? It is important for > its children. And wasn't Rafael patchset trying to address exactkly > this? Yes, it was. Thanks, Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-06 19:35 ` Arjan van de Ven 2009-12-06 19:58 ` Linus Torvalds @ 2009-12-06 20:36 ` Alan Stern 2009-12-06 21:17 ` Arjan van de Ven 1 sibling, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-06 20:36 UTC (permalink / raw) To: Arjan van de Ven Cc: Linus Torvalds, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list On Sun, 6 Dec 2009, Arjan van de Ven wrote: > btw I instrumented both the suspend and resume, and made graphs out of > it for my laptop (modern laptop with Intel cpu/wifi/graphics of course). > > http://www.fenrus.org/graphs/suspend.svg > http://www.fenrus.org/graphs/resume.svg > > (also attached for convenience) > > the resume clearly shows that all this talking about PCI stuff is > completely without practical merit.. it's the USB stuff where the time > is spent. Arjan, can you try testing the USB timings again with the patch below (for vanilla 2.6.32)? Fair warning: I just composed this and haven't tried it out myself. Thanks, Alan Stern Index: 2.6.32/drivers/usb/core/driver.c =================================================================== --- 2.6.32.orig/drivers/usb/core/driver.c +++ 2.6.32/drivers/usb/core/driver.c @@ -1313,8 +1313,9 @@ static int usb_resume_both(struct usb_de * then we're stuck. */ status = usb_resume_device(udev, msg); } - } else if (udev->reset_resume) + } else { status = usb_resume_device(udev, msg); + } if (status == 0 && udev->actconfig) { for (i = 0; i < udev->actconfig->desc.bNumInterfaces; i++) { Index: 2.6.32/drivers/usb/core/hub.c =================================================================== --- 2.6.32.orig/drivers/usb/core/hub.c +++ 2.6.32/drivers/usb/core/hub.c @@ -1674,7 +1674,7 @@ static int usb_configure_device_otg(stru * (Includes HNP test device.) */ if (udev->bus->b_hnp_enable || udev->bus->is_b_host) { - err = usb_port_suspend(udev, PMSG_SUSPEND); + err = usb_port_suspend(udev, PMSG_AUTO_SUSPEND); if (err < 0) dev_dbg(&udev->dev, "HNP fail, %d\n", err); } @@ -2060,6 +2060,7 @@ static int check_port_resume_type(struct /* * usb_port_suspend - suspend a usb device's upstream port * @udev: device that's no longer in active use, not a root hub + * @msg: Power Management message describing this state transition * Context: must be able to sleep; device not locked; pm locks held * * Suspends a USB device that isn't in active use, conserving power. @@ -2107,7 +2108,7 @@ int usb_port_suspend(struct usb_device * { struct usb_hub *hub = hdev_to_hub(udev->parent); int port1 = udev->portnum; - int status; + int status = 0; // dev_dbg(hub->intfdev, "suspend port %d\n", port1); @@ -2128,6 +2129,13 @@ int usb_port_suspend(struct usb_device * status); } + /* For system sleep transitions we don't actually need to suspend + * the port. The device will suspend itself when the entire bus + * is suspended. + */ + if (!(msg.event & (PM_EVENT_USER | PM_EVENT_REMOTE | PM_EVENT_AUTO))) + return status; + /* see 7.1.7.6 */ status = set_port_feature(hub->hdev, port1, USB_PORT_FEAT_SUSPEND); if (status) { @@ -2231,6 +2239,7 @@ static int finish_port_resume(struct usb /* * usb_port_resume - re-activate a suspended usb device's upstream port * @udev: device to re-activate, not a root hub + * @msg: Power Management message describing this state transition * Context: must be able to sleep; device not locked; pm locks held * * This will re-activate the suspended device, increasing power usage ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-06 20:36 ` Alan Stern @ 2009-12-06 21:17 ` Arjan van de Ven 2009-12-06 21:46 ` Alan Stern 0 siblings, 1 reply; 235+ messages in thread From: Arjan van de Ven @ 2009-12-06 21:17 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list On Sun, 6 Dec 2009 15:36:40 -0500 (EST) Alan Stern <stern@rowland.harvard.edu> wrote: > On Sun, 6 Dec 2009, Arjan van de Ven wrote: > > > btw I instrumented both the suspend and resume, and made graphs out > > of it for my laptop (modern laptop with Intel cpu/wifi/graphics of > > course). > > > > http://www.fenrus.org/graphs/suspend.svg > > http://www.fenrus.org/graphs/resume.svg > > > > (also attached for convenience) > > > > the resume clearly shows that all this talking about PCI stuff is > > completely without practical merit.. it's the USB stuff where the > > time is spent. > > Arjan, can you try testing the USB timings again with the patch below > (for vanilla 2.6.32)? > > Fair warning: I just composed this and haven't tried it out myself. unfortunately it does not make a difference that I can notice in the graphs. http://www.fenrus.org/graphs/resume2.svg the resume problem seems to be that we resume all the hubs sequentially, much like we used to discover them sequentially during boot.... I do not know how much I'm asking for, but would it be sensible to do a similar thing for hub resume as we did for boot? eg start resuming them all at the same time, so that the mandatory delays of these hubs will overlap ? -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-06 21:17 ` Arjan van de Ven @ 2009-12-06 21:46 ` Alan Stern 2009-12-06 21:57 ` Arjan van de Ven 0 siblings, 1 reply; 235+ messages in thread From: Alan Stern @ 2009-12-06 21:46 UTC (permalink / raw) To: Arjan van de Ven Cc: Linus Torvalds, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list On Sun, 6 Dec 2009, Arjan van de Ven wrote: > > Arjan, can you try testing the USB timings again with the patch below > > (for vanilla 2.6.32)? > > > > Fair warning: I just composed this and haven't tried it out myself. > > unfortunately it does not make a difference that I can notice in the > graphs. > > http://www.fenrus.org/graphs/resume2.svg Disappointing... > the resume problem seems to be that we resume all the hubs sequentially, > much like we used to discover them sequentially during boot.... But the patch should have reduced the time required to resume each non-root hub. So the fact that they go sequentially shouldn't matter as much. For root hubs the patch won't help. Their delays can't be reduced. > I do not know how much I'm asking for, but would it be sensible to do a > similar thing for hub resume as we did for boot? eg start resuming them > all at the same time, so that the mandatory delays of these hubs will > overlap ? For one thing, there shouldn't be any mandatory delays for non-root hubs during resume-from-RAM (although this depends to some extent on your system firmware -- and it probably helps to have USB-2.0 hubs rather than USB-1.1). More importantly, what you're asking is impossible given the way the PM core is structured. The hub-resume routine can't return early because then it wouldn't be possible to resume devices plugged into that hub. (Ironically, your request is essentially what Rafael was trying to accomplish in the patches that provoked this email conversation.) Guess I'll just have to try out your timing log addition for myself and see what's going on... Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-06 21:46 ` Alan Stern @ 2009-12-06 21:57 ` Arjan van de Ven 2009-12-06 22:04 ` Alan Stern 0 siblings, 1 reply; 235+ messages in thread From: Arjan van de Ven @ 2009-12-06 21:57 UTC (permalink / raw) To: Alan Stern Cc: Linus Torvalds, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list On Sun, 6 Dec 2009 16:46:16 -0500 (EST) Alan Stern <stern@rowland.harvard.edu> wrote: h won't help. Their delays can't be reduced. > > > I do not know how much I'm asking for, but would it be sensible to > > do a similar thing for hub resume as we did for boot? eg start > > resuming them all at the same time, so that the mandatory delays of > > these hubs will overlap ? > > For one thing, there shouldn't be any mandatory delays for non-root > hubs during resume-from-RAM (although this depends to some extent on > your system firmware -- and it probably helps to have USB-2.0 hubs > rather than USB-1.1). > > More importantly, what you're asking is impossible given the way the > PM core is structured. The hub-resume routine can't return early > because then it wouldn't be possible to resume devices plugged into > that hub. having spent 30 minutes trying to grok this code, I think there may be a trick in using the async function call infrastructure. if each USB hub's resume (hub_resume()) would be done as an async function call, that would start allowing the hub resumes to go async, but this is not enough. usb_resume_both() would also then need to be an async call itself, and do its "resume the parent" recursion as a async function call, and then it needs to do a synchronization before actually resuming the device itself (provided it is not a hub or hub like device I suppose). the later synchronization guarantees that no device will be resumed before it's parent tree structure is resumed, while allowing parallel parts of the tree to be resumed in parallel. The hard part in this is the locking.... that is getting non-trivial once you have multiple asynchronous functions executing. -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-06 21:57 ` Arjan van de Ven @ 2009-12-06 22:04 ` Alan Stern 0 siblings, 0 replies; 235+ messages in thread From: Alan Stern @ 2009-12-06 22:04 UTC (permalink / raw) To: Arjan van de Ven Cc: Linus Torvalds, Rafael J. Wysocki, LKML, ACPI Devel Maling List, pm list On Sun, 6 Dec 2009, Arjan van de Ven wrote: > having spent 30 minutes trying to grok this code, I think there may be > a trick in using the async function call infrastructure. > > if each USB hub's resume (hub_resume()) would be done as an async > function call, that would start allowing the hub resumes to go async, > but this is not enough. > > usb_resume_both() would also then need to be an async call itself, and > do its "resume the parent" recursion as a async function call, and then > it needs to do a synchronization before actually resuming the device > itself (provided it is not a hub or hub like device I suppose). > > the later synchronization guarantees that no device will be resumed > before it's parent tree structure is resumed, while allowing parallel > parts of the tree to be resumed in parallel. > > The hard part in this is the locking.... that is getting non-trivial > once you have multiple asynchronous functions executing. That's the whole point of Rafael's async suspend/resume framework. He has done the hard work already. Alan Stern ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-05 21:43 ` Linus Torvalds 2009-12-05 21:58 ` Linus Torvalds @ 2009-12-06 0:29 ` Rafael J. Wysocki 2009-12-06 0:52 ` Linus Torvalds 1 sibling, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-06 0:29 UTC (permalink / raw) To: Linus Torvalds; +Cc: LKML, ACPI Devel Maling List, pm list On Saturday 05 December 2009, Linus Torvalds wrote: > > On Sat, 5 Dec 2009, Rafael J. Wysocki wrote: > > > > * Asynchronous suspend and resume infrastructure. For now, PCI, ACPI and > > serio devices are enabled to suspend and resume asynchronously. > > I really think this is totally and utterly broken. Both from an > implementation standpoint _and_ from a pure conceptual one. > > Why isn't the suspend/resume async stuff just done like the init async > stuff? > > We don't need that crazy per-device flag for initialization, neither do we > need drivers "enabling" any async code at all. They just do some things > asynchronously, and then at the end of init time we wait for all those > async events. > > So why does suspend/resume need to do crazy sh*t instead? Because it can run entire suspend and resume callbacks in parallel and not just some stuff inside of them. The flag is to tell it which callbacks not to execute in parallel, but it essentially should not be necessary as soon as we know all dependences between devices (ie. the ones that are not encoded in the structure of the device tree). The problem is there are dependences between devices we're not aware of, which are not documented anywhere and not reflected by the device tree structure and we need some time to figure them out. > It all looks terminally broken: you force async suspend for all PCI > drivers, even when it makes no sense. I'm not exactly sure what you're referring to. The async suspend is not forced, it just tells the PM core that it can execute PCI suspend/resume callbacks in parallel as long as the devices in question don't depend on each other. > Rather than let the drivers that already know how to do things like disk > spinup asynchronously just do it that way. This isn't just about disk spin up and things like that. If we can run entire suspend/resume callbacks in parallel, why not to do that? > The "timing" routines are also just crazy. What is the excuse for > dpm_show_time() taking both start and stop times, This is a mistake, although really easily fixable in a followup patch. > since there is never any valid situation when it shouldn't have that > do_gettimgofday(&stop) just before it? IOW - the whole end-time thing should > be _inside_ dpm_show_time, rather than being done by the caller. No? Yes, you're right. > In other words - I'm not pulling this crazy thing. You'd better explain > why it was done that way, when we already have done the same things better > before in different ways. I'm not sure we have, but whatever. As I said before, if the rest of the changes in my pull request are fine with you, I'll just drop the async changes, although I'm not really convinced they're so bad. They've been discussed a lot and they've been in linux-next for a few months without any objection from anyone. Thanks, Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-06 0:29 ` Rafael J. Wysocki @ 2009-12-06 0:52 ` Linus Torvalds 2009-12-06 1:24 ` Rafael J. Wysocki 0 siblings, 1 reply; 235+ messages in thread From: Linus Torvalds @ 2009-12-06 0:52 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: LKML, ACPI Devel Maling List, pm list On Sun, 6 Dec 2009, Rafael J. Wysocki wrote: > > > It all looks terminally broken: you force async suspend for all PCI > > drivers, even when it makes no sense. > > I'm not exactly sure what you're referring to. The async suspend is not > forced, it just tells the PM core that it can execute PCI suspend/resume > callbacks in parallel as long as the devices in question don't depend on each > other. That's exactly what I mean by forcing async suspend/resume. You don't know the ordering rules for PCi devices. Multi-function PCI devices commonly share registers - they're on the same chip, after all. And even when the _hardware_ is totally independent, we often have discovery rules and want to initialize in order because different drivers will do things like unregister entirely on suspend, and then re-register on resume. Imagine the mess when two ethernet devices randomly end up coming up with different names (eth0/eth1) depending on subtle timing issues. THAT is why we do things in order. Asynchronous programming is _hard_. Just deciding that "all PCI devices can always be resumed and suspended asynchronously" is a much MUCH bigger decision than you seem to have even realized. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-06 0:52 ` Linus Torvalds @ 2009-12-06 1:24 ` Rafael J. Wysocki 2009-12-06 1:50 ` Linus Torvalds 0 siblings, 1 reply; 235+ messages in thread From: Rafael J. Wysocki @ 2009-12-06 1:24 UTC (permalink / raw) To: Linus Torvalds; +Cc: LKML, ACPI Devel Maling List, pm list On Sunday 06 December 2009, Linus Torvalds wrote: > > On Sun, 6 Dec 2009, Rafael J. Wysocki wrote: > > > > > It all looks terminally broken: you force async suspend for all PCI > > > drivers, even when it makes no sense. > > > > I'm not exactly sure what you're referring to. The async suspend is not > > forced, it just tells the PM core that it can execute PCI suspend/resume > > callbacks in parallel as long as the devices in question don't depend on each > > other. > > That's exactly what I mean by forcing async suspend/resume. > > You don't know the ordering rules for PCi devices. That's true at the moment, but in principle we can abstract all dependences between devices as PM links that will enforce specific suspend/resume ordering between them. > Multi-function PCI devices commonly share registers - they're on the same > chip, after all. And even when the _hardware_ is totally independent, we > often have discovery rules and want to initialize in order because different > drivers will do things like unregister entirely on suspend, and then > re-register on resume. Do any of the PCI drivers do that? > Imagine the mess when two ethernet devices randomly end up coming up with > different names (eth0/eth1) depending on subtle timing issues. > > THAT is why we do things in order. Asynchronous programming is _hard_. > Just deciding that "all PCI devices can always be resumed and suspended > asynchronously" is a much MUCH bigger decision than you seem to have > even realized. I have considered that, but at the end of the day I haven't seen a single problem with that showing up in testing during the last two or three months. Given the time the patchset spent in linux-next I'd expect someone to report a problem with it - if there's a problem. But no one has said a word, so I'm not that worried, although I'm still a bit cautious. That's why there is the switch for disabling the feature altogether. It is enabled by default, which perhaps is not the right setting, but I don't really see the reason why not to turn it on where it doesn't break things (like on all of my test boxes at the moment). Still, as I said before, the other changes in my pull request are more important to me than the async patchset, so please let me know if they are fine with you. Thanks, Rafael ^ permalink raw reply [flat|nested] 235+ messages in thread
* Re: [GIT PULL] PM updates for 2.6.33 2009-12-06 1:24 ` Rafael J. Wysocki @ 2009-12-06 1:50 ` Linus Torvalds 0 siblings, 0 replies; 235+ messages in thread From: Linus Torvalds @ 2009-12-06 1:50 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: LKML, ACPI Devel Maling List, pm list On Sun, 6 Dec 2009, Rafael J. Wysocki wrote: > > > Multi-function PCI devices commonly share registers - they're on the same > > chip, after all. And even when the _hardware_ is totally independent, we > > often have discovery rules and want to initialize in order because different > > drivers will do things like unregister entirely on suspend, and then > > re-register on resume. > > Do any of the PCI drivers do that? It used to be common at least for ethernet - there were a number of drivers that essentially did the same thing on suspend/resume and on module unload/reload. The point is, I don't know. And neither do you. It's much safer to just do drivers one by one, and not touch drivers that people don't test. Linus ^ permalink raw reply [flat|nested] 235+ messages in thread
end of thread, other threads:[~2009-12-25 17:09 UTC | newest] Thread overview: 235+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-12-05 21:16 [GIT PULL] PM updates for 2.6.33 Rafael J. Wysocki 2009-12-05 21:43 ` Linus Torvalds 2009-12-05 21:58 ` Linus Torvalds 2009-12-05 23:55 ` Rafael J. Wysocki 2009-12-06 0:45 ` Arjan van de Ven 2009-12-06 1:26 ` Rafael J. Wysocki 2009-12-06 1:58 ` Arjan van de Ven 2009-12-06 8:39 ` Ingo Molnar 2009-12-06 0:48 ` Linus Torvalds 2009-12-06 1:54 ` Rafael J. Wysocki 2009-12-06 1:57 ` Rafael J. Wysocki 2009-12-06 2:05 ` Linus Torvalds 2009-12-06 2:36 ` Rafael J. Wysocki 2009-12-06 15:23 ` Alan Stern 2009-12-06 19:04 ` [linux-pm] " Victor Lowther 2009-12-07 3:57 ` Zhang Rui 2009-12-07 5:57 ` Linus Torvalds 2009-12-07 6:15 ` Linus Torvalds 2009-12-17 23:28 ` Benjamin Herrenschmidt 2009-12-07 6:37 ` Arjan van de Ven 2009-12-07 15:13 ` Alan Stern 2009-12-07 16:31 ` Linus Torvalds 2009-12-07 16:55 ` Linus Torvalds 2009-12-07 17:52 ` Alan Stern 2009-12-07 18:05 ` Linus Torvalds 2009-12-07 20:37 ` Alan Stern 2009-12-07 20:48 ` Linus Torvalds 2009-12-07 21:32 ` Alan Stern 2009-12-07 21:41 ` Linus Torvalds 2009-12-07 21:47 ` Rafael J. Wysocki 2009-12-07 22:01 ` Alan Stern 2009-12-07 22:06 ` Linus Torvalds 2009-12-07 22:21 ` Alan Stern 2009-12-07 22:26 ` Linus Torvalds 2009-12-07 23:16 ` Alan Stern 2009-12-07 22:02 ` Rafael J. Wysocki 2009-12-07 22:16 ` Linus Torvalds 2009-12-07 23:51 ` Rafael J. Wysocki 2009-12-08 3:27 ` Alan Stern 2009-12-08 12:23 ` Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) Rafael J. Wysocki 2009-12-08 12:35 ` Rafael J. Wysocki 2009-12-08 15:35 ` Linus Torvalds 2009-12-08 15:55 ` Alan Stern 2009-12-08 16:42 ` Linus Torvalds 2009-12-08 18:08 ` Alan Stern 2009-12-08 18:41 ` Linus Torvalds 2009-12-08 18:52 ` Linus Torvalds 2009-12-08 19:34 ` Alan Stern 2009-12-08 19:30 ` Alan Stern 2009-12-08 20:48 ` Linus Torvalds 2009-12-08 21:32 ` Alan Stern 2009-12-08 21:52 ` Christian Borntraeger 2009-12-08 22:16 ` Linus Torvalds 2009-12-09 19:06 ` Alan Stern 2009-12-09 21:52 ` Linus Torvalds 2009-12-08 19:44 ` Rafael J. Wysocki 2009-12-08 20:16 ` Alan Stern 2009-12-08 20:30 ` Rafael J. Wysocki 2009-12-08 20:44 ` Alan Stern 2009-12-08 20:52 ` Rafael J. Wysocki 2009-12-08 21:40 ` Alan Stern 2009-12-08 21:48 ` spinlock in completion_done() (was: Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33)) Rafael J. Wysocki 2009-12-09 9:29 ` Ingo Molnar 2009-12-09 22:37 ` Rafael J. Wysocki 2009-12-10 7:59 ` Ingo Molnar 2009-12-11 4:10 ` Dave Chinner 2009-12-11 7:54 ` Ingo Molnar 2009-12-12 23:07 ` [PATCH] sched: Make wakeup side variants of completion API irq safe (was: Re: spinlock in completion_done()) Rafael J. Wysocki 2009-12-13 7:36 ` [tip:sched/urgent] sched: Make wakeup side and atomic variants of completion API irq safe tip-bot for Rafael J. Wysocki 2009-12-08 22:18 ` Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) Linus Torvalds 2009-12-09 2:11 ` Alan Stern 2009-12-08 21:08 ` Linus Torvalds 2009-12-08 21:13 ` Linus Torvalds 2009-12-08 22:07 ` Alan Stern 2009-12-08 22:30 ` Rafael J. Wysocki 2009-12-09 2:23 ` Alan Stern 2009-12-09 21:56 ` Rafael J. Wysocki 2009-12-09 22:27 ` Alan Stern 2009-12-08 22:32 ` Linus Torvalds 2009-12-09 2:35 ` Alan Stern 2009-12-09 2:54 ` Linus Torvalds 2009-12-09 15:24 ` Alan Stern 2009-12-09 15:38 ` Linus Torvalds 2009-12-09 15:57 ` Alan Stern 2009-12-25 17:09 ` Pavel Machek 2009-12-09 13:38 ` Mark Brown 2009-12-09 15:49 ` Alan Stern 2009-12-09 16:02 ` Mark Brown 2009-12-09 16:23 ` Alan Stern 2009-12-09 16:46 ` Mark Brown 2009-12-09 16:57 ` Linus Torvalds 2009-12-09 17:45 ` Mark Brown 2009-12-09 17:57 ` Linus Torvalds 2009-12-09 18:27 ` Mark Brown 2009-12-09 17:10 ` Alan Stern 2009-12-09 17:19 ` Linus Torvalds 2009-12-09 18:08 ` Mark Brown 2009-12-08 21:04 ` Linus Torvalds 2009-12-08 21:40 ` Rafael J. Wysocki 2009-12-08 22:03 ` Rafael J. Wysocki 2009-12-08 22:55 ` Async suspend-resume patch w/ rwsems " Rafael J. Wysocki 2009-12-08 23:24 ` Rafael J. Wysocki 2009-12-09 20:15 ` Alan Stern 2009-12-09 22:18 ` Rafael J. Wysocki 2009-12-09 22:38 ` Alan Stern 2009-12-09 23:18 ` Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) Rafael J. Wysocki 2009-12-10 2:51 ` Linus Torvalds 2009-12-10 19:40 ` Rafael J. Wysocki 2009-12-10 23:30 ` Linus Torvalds 2009-12-11 1:02 ` Rafael J. Wysocki 2009-12-11 1:25 ` Linus Torvalds 2009-12-11 3:42 ` Alan Stern 2009-12-11 22:17 ` Rafael J. Wysocki 2009-12-12 0:38 ` Alan Stern 2009-12-11 22:11 ` Rafael J. Wysocki 2009-12-11 22:31 ` Linus Torvalds 2009-12-11 23:48 ` Rafael J. Wysocki 2009-12-11 23:53 ` Linus Torvalds 2009-12-12 17:48 ` Rafael J. Wysocki 2009-12-12 18:54 ` Linus Torvalds 2009-12-12 22:34 ` Rafael J. Wysocki 2009-12-12 22:40 ` Rafael J. Wysocki 2009-12-14 18:21 ` Linus Torvalds 2009-12-14 22:11 ` Rafael J. Wysocki 2009-12-14 22:41 ` Linus Torvalds 2009-12-14 22:43 ` Linus Torvalds 2009-12-14 23:18 ` Rafael J. Wysocki 2009-12-15 0:10 ` Linus Torvalds 2009-12-15 0:11 ` Linus Torvalds 2009-12-15 11:14 ` Rafael J. Wysocki 2009-12-15 15:31 ` Linus Torvalds 2009-12-15 11:03 ` Rafael J. Wysocki 2009-12-15 15:26 ` Linus Torvalds 2009-12-15 15:55 ` Alan Stern 2009-12-15 16:28 ` Linus Torvalds 2009-12-15 18:57 ` Linus Torvalds 2009-12-15 20:26 ` Alan Stern 2009-12-15 21:26 ` Rafael J. Wysocki 2009-12-15 22:01 ` Alan Stern 2009-12-15 21:54 ` Linus Torvalds 2009-12-15 22:27 ` Alan Stern 2009-12-16 2:11 ` Rafael J. Wysocki 2009-12-16 6:40 ` Dmitry Torokhov 2009-12-18 22:43 ` Rafael J. Wysocki 2009-12-19 19:59 ` Dmitry Torokhov 2009-12-19 21:33 ` Rafael J. Wysocki 2009-12-19 22:29 ` Rafael J. Wysocki 2009-12-19 22:43 ` Dmitry Torokhov 2009-12-19 22:47 ` Dmitry Torokhov 2009-12-19 23:10 ` Rafael J. Wysocki 2009-12-19 23:22 ` Dmitry Torokhov 2009-12-19 23:33 ` Rafael J. Wysocki 2009-12-19 23:23 ` Linus Torvalds 2009-12-19 23:40 ` Rafael J. Wysocki 2009-12-19 23:46 ` Linus Torvalds 2009-12-19 23:47 ` Linus Torvalds 2009-12-19 23:54 ` Rafael J. Wysocki 2009-12-19 23:53 ` Rafael J. Wysocki 2009-12-20 0:09 ` Linus Torvalds 2009-12-20 0:35 ` Rafael J. Wysocki 2009-12-20 2:41 ` Dmitry Torokhov 2009-12-20 19:25 ` [linux-pm] " Rafael J. Wysocki 2009-12-21 7:39 ` [linux-pm] Async suspend-resume patch w/ completions (was: Re: Async?suspend-resume " Dmitry Torokhov 2009-12-21 11:20 ` Vojtech Pavlik 2009-12-20 2:45 ` Async suspend-resume patch w/ completions (was: Re: Async suspend-resume " Dmitry Torokhov 2009-12-20 3:59 ` Alan Stern 2009-12-20 12:52 ` Rafael J. Wysocki 2009-12-20 17:12 ` Alan Stern 2009-12-20 18:10 ` Rafael J. Wysocki 2009-12-20 19:38 ` Alan Stern 2009-12-20 19:51 ` Rafael J. Wysocki 2009-12-16 15:22 ` Alan Stern 2009-12-16 19:26 ` Rafael J. Wysocki 2009-12-16 15:47 ` Linus Torvalds 2009-12-16 19:27 ` Rafael J. Wysocki 2009-12-16 20:59 ` Linus Torvalds 2009-12-16 21:57 ` Rafael J. Wysocki 2009-12-16 22:11 ` Linus Torvalds 2009-12-16 22:33 ` Rafael J. Wysocki 2009-12-16 23:04 ` Alan Stern 2009-12-16 23:18 ` Rafael J. Wysocki 2009-12-17 1:30 ` Rafael J. Wysocki 2009-12-17 1:49 ` Rafael J. Wysocki 2009-12-17 20:06 ` Alan Stern 2009-12-17 20:36 ` Rafael J. Wysocki 2009-12-18 1:51 ` Rafael J. Wysocki 2009-12-18 17:26 ` Alan Stern 2009-12-19 21:41 ` Rafael J. Wysocki 2009-12-20 3:48 ` Alan Stern 2009-12-20 12:55 ` Rafael J. Wysocki 2009-12-18 23:42 ` Rafael J. Wysocki 2009-12-13 13:08 ` Rafael J. Wysocki 2009-12-13 17:30 ` Alan Stern 2009-12-13 19:02 ` [linux-pm] " Alan Stern 2009-12-12 0:43 ` Alan Stern 2009-12-12 17:35 ` Rafael J. Wysocki 2009-12-10 15:31 ` Alan Stern 2009-12-10 15:45 ` Linus Torvalds 2009-12-10 18:37 ` Alan Stern 2009-12-10 23:51 ` Linus Torvalds 2009-12-10 21:14 ` Rafael J. Wysocki 2009-12-10 22:17 ` Alan Stern 2009-12-10 23:45 ` Rafael J. Wysocki 2009-12-07 15:15 ` [GIT PULL] PM updates for 2.6.33 Rafael J. Wysocki 2009-12-07 16:37 ` Linus Torvalds 2009-12-07 20:47 ` Rafael J. Wysocki 2009-12-07 20:56 ` Linus Torvalds 2009-12-07 5:20 ` Linus Torvalds 2009-12-07 15:42 ` Alan Stern 2009-12-06 19:35 ` Arjan van de Ven 2009-12-06 19:58 ` Linus Torvalds 2009-12-06 20:18 ` Arjan van de Ven 2009-12-06 21:08 ` Linus Torvalds 2009-12-06 22:54 ` Dmitry Torokhov 2009-12-07 0:55 ` Arjan van de Ven 2009-12-07 2:27 ` Dmitry Torokhov 2009-12-07 5:26 ` Arjan van de Ven 2009-12-07 6:00 ` Dmitry Torokhov 2009-12-21 9:01 ` Pavel Machek 2009-12-07 1:18 ` Arjan van de Ven 2009-12-07 2:27 ` Dmitry Torokhov 2009-12-07 5:31 ` Arjan van de Ven 2009-12-07 6:15 ` Dmitry Torokhov 2009-12-07 6:31 ` Arjan van de Ven 2009-12-07 6:32 ` Dmitry Torokhov 2009-12-07 15:17 ` Rafael J. Wysocki 2009-12-06 20:36 ` Alan Stern 2009-12-06 21:17 ` Arjan van de Ven 2009-12-06 21:46 ` Alan Stern 2009-12-06 21:57 ` Arjan van de Ven 2009-12-06 22:04 ` Alan Stern 2009-12-06 0:29 ` Rafael J. Wysocki 2009-12-06 0:52 ` Linus Torvalds 2009-12-06 1:24 ` Rafael J. Wysocki 2009-12-06 1:50 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).