linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rafael@kernel.org>
To: Lyude Paul <lyude@redhat.com>
Cc: Lukas Wunner <lukas@wunner.de>,
	nouveau@lists.freedesktop.org, David Airlie <airlied@linux.ie>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	Ben Skeggs <bskeggs@redhat.com>,
	Linux PM <linux-pm@vger.kernel.org>
Subject: Re: [Nouveau] [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths
Date: Wed, 18 Jul 2018 09:40:44 +0200	[thread overview]
Message-ID: <CAJZ5v0jQXwxXCzxiLcVnBQPvrAX=GRUr2O=TknmhAS3U_ZTAtg@mail.gmail.com> (raw)
In-Reply-To: <2dbe75b1a83c025b9cddc229dbca9af6fb30111e.camel@redhat.com>

On Tue, Jul 17, 2018 at 8:34 PM, Lyude Paul <lyude@redhat.com> wrote:
> On Tue, 2018-07-17 at 20:32 +0200, Lukas Wunner wrote:
>> On Tue, Jul 17, 2018 at 02:24:31PM -0400, Lyude Paul wrote:
>> > On Tue, 2018-07-17 at 20:20 +0200, Lukas Wunner wrote:
>> > > Okay, the PCI device is suspending and the nvkm_i2c_aux_acquire()
>> > > wants it in resumed state, so is waiting forever for the device to
>> > > runtime suspend in order to resume it again immediately afterwards.
>> > >
>> > > The deadlock in the stack trace you've posted could be resolved using
>> > > the technique I used in d61a5c106351 by adding the following to
>> > > include/linux/pm_runtime.h:
>> > >
>> > > static inline bool pm_runtime_status_suspending(struct device *dev)
>> > > {
>> > >   return dev->power.runtime_status == RPM_SUSPENDING;
>> > > }
>> > >
>> > > static inline bool is_pm_work(struct device *dev)
>> > > {
>> > >   struct work_struct *work = current_work();
>> > >
>> > >   return work && work->func == dev->power.work;
>> > > }
>> > >
>> > > Then adding this to nvkm_i2c_aux_acquire():
>> > >
>> > >   struct device *dev = pad->i2c->subdev.device->dev;
>> > >
>> > >   if (!(is_pm_work(dev) && pm_runtime_status_suspending(dev))) {
>> > >           ret = pm_runtime_get_sync(dev);
>> > >           if (ret < 0 && ret != -EACCES)
>> > >                   return ret;
>> > >   }
>> > >
>> > > But here's the catch:  This only works for an *async* runtime suspend.
>> > > It doesn't work for pm_runtime_put_sync(), pm_runtime_suspend() etc,
>> > > because then the runtime suspend is executed in the context of the caller,
>> > > not in the context of dev->power.work.
>> > >
>> > > So it's not a full solution, but hopefully something that gets you
>> > > going.  I'm not really familiar with the code paths leading to
>> > > nvkm_i2c_aux_acquire() to come up with a full solution off the top
>> > > of my head I'm afraid.
>> >
>> > OK-I was considering doing something similar to that commit beforehand but I
>> > wasn't sure if I was going to just be hacking around an actual issue. That
>> > doesn't seem to be the case. This is very helpful and hopefully I should be
>> > able
>> > to figure something out from this, thanks!
>>
>> In some cases, the function acquiring the runtime PM ref is only called
>> from a couple of places and then it would be feasible and appropriate
>> to add a bool parameter to the function telling it to acquire the ref
>> or not.  So the function is told using a parameter which context it's
>> running in:  In the runtime_suspend code path or some other code path.
>>
>> The technique to use current_work() is an alternative approach to figure
>> out the context if passing in an additional parameter is not feasible
>> for some reason.  That was the case with d61a5c106351.  That approach
>> only works for work items though.
>
> Something I'm curious about. This isn't the first time I've hit a situation like
> this (see: the improper disable_depth fix I added into amdgpu I now need to go
> and fix), which makes me wonder: is there actually any reason Linux's runtime PM
> core doesn't just turn get/puts() in the context of s/r callbacks into no-ops by
> default?

Because it's hard to detect reliably enough and because hiding issues
is a bad idea in general.

As I've just said in the message to Lukas, the fact that you need to
resume another device from within your resume callback indicates that
you're hiding your dependency graph from the core.

Thanks,
Rafael

  reply	other threads:[~2018-07-18  7:40 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-16 23:59 [PATCH 0/5] drm/nouveau: Fix a lot of nasty RPM bugs and deadlocks Lyude Paul
2018-07-16 23:59 ` [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths Lyude Paul
2018-07-17  7:16   ` [Nouveau] " Lukas Wunner
2018-07-17  7:39     ` Rafael J. Wysocki
2018-07-17 16:53     ` Lyude Paul
2018-07-17 18:20       ` Lukas Wunner
2018-07-17 18:24         ` Lyude Paul
2018-07-17 18:32           ` Lukas Wunner
2018-07-17 18:34             ` Lyude Paul
2018-07-18  7:40               ` Rafael J. Wysocki [this message]
2018-07-18  7:47               ` Lukas Wunner
2018-07-18  7:38         ` Rafael J. Wysocki
2018-07-18  8:25           ` Lukas Wunner
2018-07-18  8:35             ` Rafael J. Wysocki
2018-07-18  8:36             ` Lukas Wunner
2018-07-18 20:11               ` Lyude Paul
2018-07-18 21:49                 ` Rafael J. Wysocki
2018-07-16 23:59 ` [PATCH 2/5] drm/nouveau: Grab RPM ref while probing outputs Lyude Paul
2018-07-17  7:21   ` [Nouveau] " Lukas Wunner
2018-07-17 10:12     ` Karol Herbst
2018-07-16 23:59 ` [PATCH 3/5] drm/nouveau: Add missing RPM get/put() when probing connectors Lyude Paul
2018-07-17 10:11   ` [Nouveau] " Karol Herbst
2018-07-16 23:59 ` [PATCH 4/5] drm/nouveau: Grab RPM ref when i2c bus is in use Lyude Paul
2018-07-17 10:17   ` [Nouveau] " Karol Herbst
2018-07-17 11:54     ` Ben Skeggs
2018-07-17 12:10       ` Karol Herbst
2018-07-16 23:59 ` [PATCH 5/5] drm/nouveau: Grab RPM ref when aux " Lyude Paul

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJZ5v0jQXwxXCzxiLcVnBQPvrAX=GRUr2O=TknmhAS3U_ZTAtg@mail.gmail.com' \
    --to=rafael@kernel.org \
    --cc=airlied@linux.ie \
    --cc=bskeggs@redhat.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=lyude@redhat.com \
    --cc=nouveau@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).