From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751914AbdJPBnj (ORCPT ); Sun, 15 Oct 2017 21:43:39 -0400 Received: from cloudserver094114.home.net.pl ([79.96.170.134]:63156 "EHLO cloudserver094114.home.net.pl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751804AbdJPBnJ (ORCPT ); Sun, 15 Oct 2017 21:43:09 -0400 From: "Rafael J. Wysocki" To: Linux PM Cc: Bjorn Helgaas , Alan Stern , Greg Kroah-Hartman , LKML , Linux ACPI , Linux PCI , Linux Documentation , Mika Westerberg , Ulf Hansson , Andy Shevchenko , Kevin Hilman , Wolfram Sang , linux-i2c@vger.kernel.org, Lee Jones Subject: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume Date: Mon, 16 Oct 2017 03:12:35 +0200 Message-ID: <3806130.B2KCK0tvef@aspire.rjw.lan> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi All, Well, this took more time than expected, as I tried to cover everything I had in mind regarding PM flags for drivers. This work was triggered by attempts to fix and optimize PM in the i2c-designware-platdev driver that ended up with adding a couple of flags to the driver's internal data structures for the tracking of device state (https://marc.info/?l=linux-acpi&m=150629646805636&w=2). That approach is sort of suboptimal, though, because other drivers will probably want to do similar things and if all of them need to use internal flags for that, quite a bit of code duplication may ensue at least. That can be avoided in a couple of ways and one of them is to provide a means for drivers to tell the core what to do and to make the core take care of it if told to do so. Hence, the idea to use driver flags for system-wide PM that was briefly discussed during the LPC in LA last month. One of the flags considered at that time was to possibly cause the core to reuse the runtime PM callback path of a device for system suspend/resume. Admittedly, that idea didn't look too bad to me until I had started to try to implement it and I got to the PCI bus type's hibernation callbacks. Then, I moved the patch I was working on to /dev/null right away. I mean it. No, this is not going to happen. No way. Moreover, that experience made me realize that the whole *idea* of using the runtime PM callback path for system-wide PM was actually totally bogus (sorry Ulf). The whole point of having different callbacks pointers for different types of device transitions is because it may be necessary to do different things in those callbacks in general. Now, if you consider runtime PM and system suspend/resume *only* and from a driver perspective, then yes, in some cases the same pair of callback routines may be used for all suspend-like and resume-like transitions of the device, but if you add hibernation to the mix, then it is not so clear any more unless the callbacks don't actually do any power management at all, but simply quiesce the device's activity and then activate it again. Namely, changing power states of devices during the hibernation's "freeze" and "thaw" transitions rarely makes sense at all and the "restore" transition needs to be able to cope with uninitialized devices (in fact, it should be prepared to cope with devices in *any* state), so runtime PM is hardly suitable for them. Still, if a *driver* choses to not do any real PM in its PM callbacks and leaves that to a middle layer (quite a few drivers do that), then it possibly can use one pair of callbacks in all cases and be happy, but middle layers pretty much have to use different callback routines for different transitions. If you are a middle layer, your role is basically to do PM for a certain group of devices. Thus you cannot really do the same in ->suspend or ->suspend_early and in ->runtime_suspend (because the former generally need to take device_may_wakeup() into account and the latter doesn't) and you shouldn't really do the same in ->suspend and ->freeze (becuase the latter shouldn't change the device's power state) and so on. To put it bluntly, trying to use the ->runtime_suspend callback of a middle layer for anything other than runtime suspend is complete and utter nonsense. At the same time, the ->runtime_resume callback of a middle layer may be reused to some extent, but even that doesn't cover the "thaw" transitions during hibernation. What can work (and this is the only strategy that can work AFAICS) is to point different callback pointers *in* *a* *driver* to the same routine if the driver wants to reuse that code. That actually will work for PCI and USB drivers today, at least most of the time, but unfortunately there are problems with it for, say, platform devices. The first problem is the requirement to track the status of the device (suspended vs not suspended) in the callbacks, because the system-wide PM code in the PM core doesn't do that. The runtime PM framework does it, so this means adding some extra code which isn't necessary for runtime PM to the callback routines and that is not particularly nice. The second problem is that, if the driver wants to do anything in its ->suspend callback, it generally has to prevent runtime suspend of the device from taking place in parallel with that, which is quite cumbersome. Usually, that is taken care of by resuming the device from runtime suspend upfront, but generally doing that is wasteful (there may be no real need to resume the device except for the fact that the code is designed this way). On top of the above, there are optimizations to be made, like leaving certain devices in suspend after system resume to avoid wasting time on waiting for them to resume before user space can run again and similar. This patch series focuses on addressing those problems so as to make it easier to reuse callback routines by pointing different callback pointers to them in device drivers. The flags introduced here are to instruct the PM core and middle layers (whatever they are) on how the driver wants the device to be handled and then the driver has to provide callbacks to match these instructions and the rest should be taken care of by the code above it. The flags are introduced one by one to avoid making too many changes in one go and to allow things to be explained better (hopefully). They mostly are mutually independent with some clearly documented exceptions. The first three patches in the series are about an issue with the direct-complete optimization introduced some time ago in which some middle layers decide on whether or not to do the optimization without asking the drivers. And, as it turns out, in some cases the drivers actually know better, so the new flags introduced by these patches are here for these drivers (and the DPM_FLAG_NEVER_SKIP one is really to avoid having to define ->prepare callbacks always returning zero). The really interesting things start to happen in patches [4-9/12] which make it possible to avoid resuming devices from runtime suspend upfront during system suspend at least in some cases (and when direct-complete is not applied to the devices in question), but please refer to the changelogs for details. The i2d-designware-platdev driver is used as the primary example in the series and the patches modifying it are based on some previous changes currently in linux-next AFAICS (the same applies to the intel-lpss driver), but these patches can wait until everything is properly merged. They are included here mostly as illustration. Overall, the series is based on the linux-next branch of the linux-pm.git tree with some extra patches on top of it and all of the names of new entities introduced in it are negotiable. Thanks, Rafael