From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753296AbaBXTgG (ORCPT ); Mon, 24 Feb 2014 14:36:06 -0500 Received: from iolanthe.rowland.org ([192.131.102.54]:42662 "HELO iolanthe.rowland.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1752980AbaBXTgD (ORCPT ); Mon, 24 Feb 2014 14:36:03 -0500 Date: Mon, 24 Feb 2014 14:36:02 -0500 (EST) From: Alan Stern X-X-Sender: stern@iolanthe.rowland.org To: "Rafael J. Wysocki" cc: Linux PM list , Mika Westerberg , Aaron Lu , ACPI Devel Maling List , LKML Subject: Re: [PATCH 1/3] PM / sleep: New flag to speed up suspend-resume of suspended devices In-Reply-To: <6679559.WlyTU99iuj@vostro.rjw.lan> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 24 Feb 2014, Rafael J. Wysocki wrote: > Also, it may not do that today and I'd like to introduce a mechanism by which > that optimizatiom may be enabled by subsystems/drivers when ready. > > So for example today there is no guarantee that each device will be resumed > as appropriate by whoever handles its children when necessary, because that > code may expect the device to be operational when it is running (precisely > because we used to resume that device in .prepare()). > Yes. In particular, I'd like the "child" subsystem to be able to let the > core know that it is fine to leave the parent suspended. > That said the patchset doesn't really do both. It only really is about > skipping the ->suspend callbacks if possible, but the *consequence* of that > is that *system* resume ->resume callbacks cannot be used to resume the > device any more in general. That's because ->resume_early may try to > reverse the ->suspend_late's actions and so on, so if ->suspend_late hasn't > run, it would be a bug to run ->resume_early for that device. > I agree. For this reason, I think that the core has no choice but to treat > power.ignore_children set as "well, that device may need to be operational > going forward". This discussion is getting a little messy. Let's try to clarify it. Here is the major point: We would like to save time during system suspend/resume by skipping over devices that are already in runtime suspend, whenever it is safe to do so. Of course, the "it is safe to do so" part is what makes this difficult. It boils down to three characteristics for each device. (a) The device uses the same power state for runtime suspend and system suspend. Therefore, if the device is already in runtime suspend and the wakeup settings are correct, there is no need for the PM core to invoke the device's ->suspend callbacks. This requires a few comments. The matter of whether the same power state is used for both types of suspend is generally known beforehand, because it is decided by the subsystem or driver. The matter of whether the wakeup settings are correct often can't be known until the system suspend starts, because userspace can select whether or not wakeup should be enabled during a system sleep. Also, if the PM core is going to skip the ->suspend callbacks then it is obliged to skip the ->resume callbacks as well (we mustn't call one without the other). Therefore, in cases where (a) holds, the device will necessarily emerge from the system resume in a runtime-suspended state. This may or may not cause problems for the device's children; see below. (b) It's okay for the device's parent to be in runtime suspend when the device's ->suspend callbacks are invoked. I included this just to be thorough. In fact, I expect (b) to be true for pretty much every device already. Or if it isn't true for some devices, this is because of a special arrangement between the device's subsystem and the parent's subsystem. For example, the parent might always be runtime-resumed by its subsystem at the start of a system suspend (which is what PCI does now; I don't know if it is necessary). In the absence of any sort of special arrangement, if (b) wasn't true for some device then that device would already be experiencing problems going into system suspend. So (b) should not cause much difficulty. And if a special arrangement is present, it is a private matter between the two subsystems, not involving the PM core. (c) It's okay for the device's parent to be in runtime suspend when the device's ->resume callbacks are invoked. Unlike (b), I expect that (c) does _not_ hold for quite a few devices currently. The reason is historical: When runtime PM was first implemented, we decided that all devices should emerge from system resume in the RPM_ACTIVE state, even if they were in runtime suspend when the system suspend started. Therefore drivers could depend on the parent not being in runtime suspend while the device's ->resume callback was running. I don't think it is a good idea to perpetuate an accident of history. Instead of adding a special mechanism to the PM core for accomodating devices where (c) doesn't hold, I think we should fix up the drivers so that (c) _does_ hold everywhere. Maybe you disagree. Anyway, let's assume first that things are all fixed up, and (b) and (c) hold for every device. This means we can go back and consider (a). Since the "same power state for both types of suspend" answer is known beforehand, let's concentrate on devices where it is true (other devices will simply continue to operate as they do today). For these devices, the driver or subsystem will have to compute a flag value -- let's call it "same_wakeup_setting". The PM core can't do this because it doesn't understand the device-specific details of wakeup settings. Then your proposal comes down to this: If ->prepare returns > 0, the PM core sets the same_wakeup_setting flag. If same_wakeup_setting is on and the device is in runtime suspend, the PM core skips the various ->suspend and ->resume callbacks. My proposal was never made explicit, but it would take a form something like this: During ->prepare, the subsystem sets the same_wakeup_setting flag appropriately. If same_wakeup_setting is on and the device is in runtime suspend, the subsystem's ->suspend and ->resume callbacks return immediately. There isn't very much difference between the two proposals. Mine is a little more flexible; for example, it allows the subsystem to return immediately from ->suspend but have ->resume put the device back in the RPM_ACTIVE state. Now let's change course and suppose that (c) _doesn't_ hold for a large selection of devices. As a simple consequence, if (c) doesn't hold for some device then (a) can't be allowed to hold for any ancestor of that device. (I'm disregarding the power.ignore_children flag.) Your proposal would take the same_wakeup_setting flag (you called it "fast_suspend"), used for answering (a), and combine it with the answer to (c). That is, you would have ->prepare return > 0 only if (a) and (c) both hold -- and in addition, you turn off the flag if it is off in any child. In practice, I suspect this means fast_suspend will end up affecting only leaf devices: those with no children. This is partly because many devices don't have a .prepare method; there are plenty of entries in the device tree that don't correspond to physical devices (e.g., class devices, or devices present only for their sysfs attributes) and hence have no PM support at all. Even though (c) does hold for such devices, the PM core won't realize it. In addition, I don't like the way your proposal mixes together the answers to (a) and (c). If they could be kept separate, I think you could do a better job. For instance, suppose (a) is true for some device, but (c) is false for one of its children. Then the PM core could skip the ->suspend and ->resume callbacks for that device, and it could do a pm_runtime_resume on the device before resuming the child. The end result would be a single ->runtime_resume call, instead of ->runtime_resume followed by ->suspend and then ->resume. Alan Stern