All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Wu <peter-VTkQYDcBqhK7DlmcbJSQ7g@public.gmane.org>
To: Lukas Wunner <lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
	Dave Airlie <airlied-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: Re: [PATCH 1/9] drm/nouveau: Don't leak runtime pm ref on driver unload
Date: Mon, 30 May 2016 19:03:46 +0200	[thread overview]
Message-ID: <20160530170346.GB1355@al> (raw)
In-Reply-To: <20160529155006.GA12909-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>

On Sun, May 29, 2016 at 05:50:06PM +0200, Lukas Wunner wrote:
> Hi Peter,
> 
> On Fri, May 27, 2016 at 03:07:33AM +0200, Peter Wu wrote:
> > On Tue, May 24, 2016 at 06:03:27PM +0200, Lukas Wunner wrote:
> > > nouveau_drm_load() calls pm_runtime_put() if nouveau_runtime_pm != 0,
> > > but nouveau_drm_unload() calls pm_runtime_get_sync() unconditionally.
> > > We therefore leak a runtime pm ref whenever nouveau is loaded with
> > > runpm=0 and then unloaded. The GPU will subsequently never runtime
> > > suspend even if nouveau is loaded again with runpm=1.
> > > 
> > > Fix by taking the runtime pm ref under the same condition that it was
> > > released on driver load.
> > > 
> > > Fixes: 5addcf0a5f0f ("nouveau: add runtime PM support (v0.9)")
> > > Cc: Dave Airlie <airlied@redhat.com>
> > > Reported-by: Karol Herbst <nouveau@karolherbst.de>
> > > Tested-by: Karol Herbst <nouveau@karolherbst.de>
> > > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > 
> > Looks good, I tested this scenario:
> > 
> >     ru(){ cat /sys/bus/pci/devices/0000\:01:00.0/power/runtime_usage;}
> >     ru # reports 1
> >     modprobe nouveau runpm=0
> >     ru # reports 2
> >     rmmod nouveau
> >     ru # reports 1
> > 
> > Without runpm=0 the count drops to 0 in the second step and stays 0 in
> > the third step. After applying patch 2/9, this correctly reports 1 as
> > expected (this is the same as manually setting power/control to on).
> 
> How exactly did you reach the situation where the root port didn't wake
> up when you tried to load nouveau again? (IRC conversation this week.)

Ensure that the pci/pm patches are applied, then:

 0. Unload nouveau (I have blacklisted it for testing).
 1. Enable rpm for the root port and children (control = auto).
 2. Verify in the kernel logs that the devices are sleeping:
        pcieport 0000:00:01.0: power state changed by ACPI to D3cold
 3. (Optional, to rule out issues with delays:) Disable rpm for the
    Nvidia device (control = on).
 4. modprobe nouveau.

The above test with v4.6 + 4 pci/pm patches (8b71f565) gives:

    50.245795 MXM: GUID detected in BIOS
    50.245948    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.GFX0._DSM] at AML address ffffc90000013b11 length 492
    50.246016 ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95)
    50.246044    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.GFX0._DSM] at AML address ffffc90000013b11 length 492
    50.246110    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.PEG0.PEGP._DSM] at AML address ffffc90000018297 length 1F
    50.246256 ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95)
    50.246289    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.PEG0.PEGP._DSM] at AML address ffffc90000018297 length 1F
    50.246443 ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95)
    50.246457    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.PEG0.PEGP._DSM] at AML address ffffc90000018297 length 1F
    50.246932 pci 0000:01:00.0: optimus capabilities: enabled, status dynamic power, hda bios codec supported
    50.247005 VGA switcheroo: detected Optimus DSM method \_SB_.PCI0.PEG0.PEGP handle
    50.247084    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.PEG0.PG00._ON] at AML address ffffc9000001086e length 11D
    50.390140 pcieport 0000:00:01.0: power state changed by ACPI to D0
    50.491893    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.PEG0._DSW] at AML address ffffc90000010a2d length 1D
    50.492285 pcieport 0000:00:01.0: PME# disabled
    50.492583 nouveau 0000:01:00.0: unknown chipset (ffffffff)
    50.492687 nouveau: probe of 0000:01:00.0 failed with error -12
    50.501990    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.PEG0._S0W] at AML address ffffc90000010a8e length 2
    50.502403 pcieport 0000:00:01.0: PME# enabled
    50.502601    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.PEG0._DSW] at AML address ffffc90000010a2d length 1D
    50.513005    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.PEG0.PG00._OFF] at AML address ffffc90000010994 length 6D
    50.533258 pcieport 0000:00:01.0: power state changed by ACPI to D3cold

(Note that this patch is not included.) When nouveau is operating
normally, I see that _PS0 is also called (which does not happen above).

If you think that mixing power resources with DSM causes this issue, I
also tried to apply my power resources work for nouveau but it gives the
same problem:

    20.183306 MXM: GUID detected in BIOS
    20.183606 ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95)
    20.184158 ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95)
    20.184547 ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95)
    20.185152 pci 0000:01:00.0: optimus capabilities: enabled, status dynamic power, hda bios codec supported
    20.185351 VGA switcheroo: detected Optimus DSM method \_SB_.PCI0.PEG0.PEGP handle
    20.185384 nouveau: detected PR support, will not use DSM
    20.185552 nouveau 0000:01:00.0: enabling device (0000 -> 0003)
    20.185873 nouveau 0000:01:00.0: unknown chipset (ffffffff)
    20.185946 nouveau: probe of 0000:01:00.0 failed with error -12

> What's happening is, the PCI core will keep unbound devices (i.e.,
> without driver) in D0 but the runtime status is allowed to change
> to "suspended". So it'll appear to the kernel as if it was suspended
> but in reality it stays in D0.
> 
> Once runtime pm for PCIe ports gets merged, the root port above the
> GPU will indeed go to D3 in such a situation because the check
> pm_children_suspended() (called from rpm_check_suspend_allowed())
> returns true.
> 
> I'm not sure if this is desirable or not. If we keep unbound devices
> in D0, should we allow ports above them to go to D3?

Maybe Rafael (linux-pm / linux-pci) can answer this question better?
The comments in local_pci_probe, pci_pm_runtime_suspend and
pci_pm_runtime_resume suggest that unbound devices are assumed in D0
which is apparently not the case when runtime PM is enabled.

> In any case, when nouveau is loaded again, local_pci_probe() will
> call pm_runtime_get_sync(), which will implicitly set the runtime
> status to "active" and which should also wake parents. So how did
> you ever reach a point where you loaded nouveau and the root port
> stayed asleep? Clearly we have a bug there, question is where.
> This shouldn't work only if pm_runtime_forbid() was called on
> driver unload.
> 
> Thanks for the extensive testing!
> Lukas

Both devices (root port and Nvidia) were resumed, but somehow the Nvidia
card was not fully initialized/ready (as you can see in the above logs).

Peter

> > 
> > Peter
> > 
> > > ---
> > >  drivers/gpu/drm/nouveau/nouveau_drm.c | 5 ++++-
> > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
> > > index 11f8dd9..faf7438 100644
> > > --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> > > +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> > > @@ -498,7 +498,10 @@ nouveau_drm_unload(struct drm_device *dev)
> > >  {
> > >  	struct nouveau_drm *drm = nouveau_drm(dev);
> > >  
> > > -	pm_runtime_get_sync(dev->dev);
> > > +	if (nouveau_runtime_pm != 0) {
> > > +		pm_runtime_get_sync(dev->dev);
> > > +	}
> > > +
> > >  	nouveau_fbcon_fini(dev);
> > >  	nouveau_accel_fini(drm);
> > >  	nouveau_hwmon_fini(dev);
> > > -- 
> > > 2.8.1
> > > 
> > > _______________________________________________
> > > Nouveau mailing list
> > > Nouveau@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/nouveau
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

  parent reply	other threads:[~2016-05-30 17:03 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-24 16:03 [PATCH 0/9] Fix runtime pm ref leaks Lukas Wunner
2016-05-24 16:03 ` [PATCH 5/9] drm/radeon: Forbid runtime pm on driver unload Lukas Wunner
2016-05-24 16:03 ` [PATCH 3/9] drm/radeon: Don't leak runtime pm ref " Lukas Wunner
2016-05-24 16:03 ` [PATCH 7/9] drm/amdgpu: Don't leak runtime pm ref on driver load Lukas Wunner
2016-05-24 16:03 ` [PATCH 6/9] drm/amdgpu: Don't leak runtime pm ref on driver unload Lukas Wunner
2016-05-24 16:03 ` [PATCH 4/9] drm/radeon: Don't leak runtime pm ref on driver load Lukas Wunner
     [not found] ` <cover.1464103767.git.lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
2016-05-24 16:03   ` [PATCH 2/9] drm/nouveau: Forbid runtime pm on driver unload Lukas Wunner
2016-05-24 16:03   ` [PATCH 9/9] drm: Turn off crtc before tearing down its data structure Lukas Wunner
2016-05-24 21:30     ` [Nouveau] " Daniel Vetter
2016-05-24 22:07       ` Lukas Wunner
     [not found]         ` <20160524220753.GA5941-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
2016-05-24 22:30           ` Daniel Vetter
     [not found]       ` <20160524213042.GC27098-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
2016-05-25 10:51         ` Lukas Wunner
2016-05-25 13:43           ` [Nouveau] " Daniel Vetter
     [not found]             ` <CAKMK7uGFb9ihRtjeK7s0ezPPv-C6S9GKbE4h9MLoPyHyN=9W5Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-06-01 12:36               ` Lukas Wunner
     [not found]                 ` <20160601123641.GA15243-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
2016-06-01 14:40                   ` Daniel Vetter
2016-06-03  7:30                     ` [Nouveau] " Lukas Wunner
2016-06-03 18:21                       ` Daniel Vetter
2016-06-08 16:55                         ` Lukas Wunner
2016-05-24 16:03   ` [PATCH 1/9] drm/nouveau: Don't leak runtime pm ref on driver unload Lukas Wunner
     [not found]     ` <dd120a30cb769c93af8973cae41f61831d17e04b.1464103767.git.lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
2016-05-27  1:07       ` Peter Wu
2016-05-29 15:50         ` Lukas Wunner
     [not found]           ` <20160529155006.GA12909-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
2016-05-30 17:03             ` Peter Wu [this message]
2016-05-31 11:34               ` Lukas Wunner
     [not found]                 ` <20160531113443.GA14098-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
2016-05-31 11:41                   ` Peter Wu
2016-05-24 16:03 ` [PATCH 8/9] drm/amdgpu: Forbid runtime pm " Lukas Wunner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160530170346.GB1355@al \
    --to=peter-vtkqydcbqhk7dlmcbjsq7g@public.gmane.org \
    --cc=airlied-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
    --cc=lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org \
    --cc=nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.