On Thu, Nov 28, 2019 at 11:03:57PM +0100, Rafael J. Wysocki wrote: > On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote: > > > > --0F1p//8PRICkK4MW > > Content-Type: text/plain; charset=us-ascii > > Content-Disposition: inline > > Content-Transfer-Encoding: quoted-printable > > > > On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote: > > > On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding = > > wrote: > > > > > > > > From: Thierry Reding > > > > > > > > Currently the driver PM core will automatically acquire a runtime PM > > > > reference for devices before system sleep is entered. This is needed > > > > to avoid potential issues related to devices' parents getting put to > > > > runtime suspend at the wrong time and causing problems with their > > > > children. > > >=20 > > > Not only for that. > > >=20 > > > > In some cases drivers are carefully written to avoid such issues and > > > > the default behaviour can be changed to allow runtime PM to operate > > > > regularly during system sleep. > > >=20 > > > But this change breaks quite a few assumptions in the core too, so no, > > > it can't be made. > > > > Anything in particular that I can look at? I'm not seeing any issues > > when I test this, which could of course mean that I'm just getting > > lucky. > > There are races and such that you may never hit during casual testing. > > > One thing that irritated me is that I think this used to work. I do > > recall testing suspend/resume a few years ago and devices would get > > properly runtime suspended/resumed. > > Not true at all. > > The PM core has always taken PM-runtime references on all devices pretty much > since when PM-runtime was introduced. You're right. I was finally able to find a toolchain that I could build an old version of the kernel with. I tested system suspend/resume on the v4.8 release, which is the first one that had the runtime PM changes as well as the subsystem suspend/resume support wired up, and I can't see the runtime PM callbacks invoked during system suspend/resume. So I must be misremembering, or I'm confusing it with some other tests I was running at the time. > > I did some digging but couldn't > > find anything that would have had an impact on this. > > > > Given that this is completely opt-in feature, why are you categorically > > NAK'ing this? > > The general problem is that if any device has been touched by system-wide > suspend code, it should not be subject to PM-runtime any more until the > subsequent system-wide resume is able to undo whatever the suspend did. > > Moreover, if a device is runtime-suspended, the system-wide suspend code > may mishandle it, in general. That's why PM-runtime suspend is not allowed > during system-wide transitions at all. And it has always been like that. For this particular use-case the above should all be irrelevant. None of the drivers involved here do anything special at system suspend, because runtime suspend already puts the devices into the lowest possible power state. Basically when these devices are put into runtime suspend, they are completely turned off. The only exception is for things like HDMI where the +5V pin remains powered, so that hotplug detection will work. The runtime PM state of the devices involved is managed by the subsystem system suspend/resume helpers in DRM/KMS. Basically those helpers turn off all the devices in the composite device, which ultimately results in their last runtime PM reference being released. So for system suspend and resume, these devices aren't touched, other than maybe for the PM core's internal book-keeping. > For a specific platform you may be able to overcome these limitations if > you are careful enough, but certainly they are there in general and surely > you cannot prevent people from using your opt-in just because they think > that they know what they are doing. That's true. But the same thing is true for pretty much all other APIs. People obviously have to make sure they know what they're doing, just like they have to with any other API. I suppose the documentation for this new function is currently lacking a bit. Perhaps adding a big warning to this and listing the common pitfalls would help people make the right call about whether or not they can use this. > > Is there some other alternative that I can look into? > > First of all, ensure that the dpm_list ordering is what it should be on the > system/platform in question. That can be done with the help of device links. I don't think we have device links for everything, but the deferred probe code should take care of ordering the dpm_list correctly because we do handle deferred probe properly in all cases. Also, the dpm_list ordering isn't very critical in this case. If the devices are allowed to runtime suspend during system sleep, the subsystem sleep helper will put them into runtime suspend at the correct time. This is propagated all the way through the display pipeline and that order is ensured by the subsystem helpers. > In addition, make sure that the devices needed to suspend other devices are > suspended in the noirq phase of system-wide suspend and resumed in the > noirq phase of system-wide resume. Or at least all of the other devices > need to be suspended before them and resumed after them. We're fine on this front as well. We have run into such issues in the past, but I don't think there are any such issue left at the moment. I do have one pending fix for I2C suspend/resume which fixes an issue where some pinmuxing changes needed to get the HDMI DDC channel to work were not getting applied during resume. That I2C issue is related to this, I think. What I'm seeing is that when the system goes to sleep, the pinmux looses its programming at a hardware level, but the I2C driver doesn't know about it because it does not get runtime suspended. At runtime suspend it would switch the pinmux state to "idle" which would then match the system suspend state. Upon runtime resume it sets the "default" pinmux state, which will then restore the register programming. In the current case where runtime suspend/resume is prohibited during system sleep, upon resume the I2C driver will assume that the pinmux state is still "default" and it won't reapply the state (it's actually the pinmux subsystem that makes this decision) and causes HDMI DDC transactions to time out. One simple fix for that is to use pm_runtime_force_suspend() and pm_runtime_force_resume() as system suspend/resume callbacks to make sure the I2C controller is runtime suspended/resumed during system sleep. Note that forcing runtime suspend/resume this way is suboptimal in the DRM/KMS case because the suspend/resume happens disconnected from the subsystem suspend/resume callbacks, which is not desired as that breaks some of the assumptions in those callbacks. > These two things should allow you to cover the vast majority of cases if > not all of them without messing up with the rules. One alternative that I had thought about was to just ditch the runtime PM callbacks for this. However, there's one corner case where this may break. On early Tegra generations, the two display controllers are "coupled" in that the second one doesn't work if the first one is disabled. We describe that using a device link from the second to the first controller. This causes the first controller to be automatically be runtime resumed when the second controller is used. This only works via runtime PM, so if I don't use runtime PM I'd have to add special handling for that case. Actually, there's another problem as well. Most of these devices use generic PM domains to power on/off the SoC partitions that they're in. If I side-step runtime PM, then I'd have to somehow find a way to explicitly control the PM domains. Another alternative would be to have a kind of hybrid approach where I leave runtime PM calls in the drivers, but disconnect the runtime PM callback implementations from that. That would at least fix the issue with the generic PM domains. However, it would not fix the problem with coupled display controllers because empty runtime PM callbacks wouldn't actually power up the first display controller when it is needed by the second controller. I would have to add infrastructure that basically duplicates some of runtime PM to fix that. So the bottom line is that runtime PM is still the best solution for this problem. It works really nice and is very consistent. Do you think adding better documentation to this new flag and the accessors would help remove your concerns about this? Thierry