linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Karol Herbst <kherbst@redhat.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>,
	Mika Westerberg <mika.westerberg@intel.com>,
	Bjorn Helgaas <helgaas@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"Rafael J . Wysocki" <rjw@rjwysocki.net>,
	Linux PCI <linux-pci@vger.kernel.org>,
	Linux PM <linux-pm@vger.kernel.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	nouveau <nouveau@lists.freedesktop.org>,
	Dave Airlie <airlied@gmail.com>,
	Mario Limonciello <Mario.Limonciello@dell.com>
Subject: Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges
Date: Mon, 9 Dec 2019 13:24:04 +0100	[thread overview]
Message-ID: <CACO55tu19-14nVnnCpWz3r3nf15j6tGWzHNBRmbbs2R6O4gMCA@mail.gmail.com> (raw)
In-Reply-To: <CAJZ5v0hcONxiWD+jpBe62H1SZ-84iNxT+QCn8mcesB1C7SVWjw@mail.gmail.com>

On Mon, Dec 9, 2019 at 12:39 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Mon, Dec 9, 2019 at 12:17 PM Karol Herbst <kherbst@redhat.com> wrote:
> >
> > anybody any other ideas?
>
> Not yet, but I'm trying to collect some more information.
>
> > It seems that both patches don't really fix
> > the issue and I have no idea left on my side to try out. The only
> > thing left I could do to further investigate would be to reverse
> > engineer the Nvidia driver as they support runpm on Turing+ GPUs now,
> > but I've heard users having similar issues to the one Lyude told us
> > about... and I couldn't verify that the patches help there either in a
> > reliable way.
>
> It looks like the newer (8+) versions of Windows expect the GPU driver
> to prepare the GPU for power removal in some specific way and the
> latter fails if the GPU has not been prepared as expected.
>
> Because testing indicates that the Windows 7 path in the platform
> firmware works, it may be worth trying to do what it does to the PCIe
> link before invoking the _OFF method for the power resource
> controlling the GPU power.
>

ohh, that actually makes sense. Didn't think of that yet.

> If the Mika's theory that the Win7 path simply turns the PCIe link off
> is correct, then whatever the _OFF method tries to do to the link
> after that should not matter.
>

By the way, and I was only thinking about it after sending my last
email out, do you think we should fail the runtime resume path if the
device gets stuck in a power state?

Currently pci core always calls into the driver regardless, but maybe
for D3cold it really makes sense to just bail and refuse to resume? I
think I tried that as an early "fix" and might even have a patch
around. This should at least prevent crashes inside drivers trying to
access invalid memory or getting stuck in loops.

> > On Wed, Nov 27, 2019 at 8:55 PM Lyude Paul <lyude@redhat.com> wrote:
> > >
> > > On Wed, 2019-11-27 at 12:51 +0100, Karol Herbst wrote:
> > > > On Wed, Nov 27, 2019 at 12:49 PM Mika Westerberg
> > > > <mika.westerberg@intel.com> wrote:
> > > > > On Tue, Nov 26, 2019 at 06:10:36PM -0500, Lyude Paul wrote:
> > > > > > Hey-this is almost certainly not the right place in this thread to
> > > > > > respond,
> > > > > > but this thread has gotten so deep evolution can't push the subject
> > > > > > further to
> > > > > > the right, heh. So I'll just respond here.
> > > > >
> > > > > :)
> > > > >
> > > > > > I've been following this and helping out Karol with testing here and
> > > > > > there.
> > > > > > They had me test Bjorn's PCI branch on the X1 Extreme 2nd generation,
> > > > > > which
> > > > > > has a turing GPU and 8086:1901 PCI bridge.
> > > > > >
> > > > > > I was about to say "the patch fixed things, hooray!" but it seems that
> > > > > > after
> > > > > > trying runtime suspend/resume a couple times things fall apart again:
> > > > >
> > > > > You mean $subject patch, no?
> > > > >
> > > >
> > > > no, I told Lyude to test the pci/pm branch as the runpm errors we saw
> > > > on that machine looked different. Some BAR error the GPU reported
> > > > after it got resumed, so I was wondering if the delays were helping
> > > > with that. But after some cycles it still caused the same issue, that
> > > > the GPU disappeared. Later testing also showed that my patch also
> > > > didn't seem to help with this error sadly :/
> > > >
> > > > > > [  686.883247] nouveau 0000:01:00.0: DRM: suspending object tree...
> > > > > > [  752.866484] ACPI Error: Aborting method \_SB.PCI0.PEG0.PEGP.NVPO due
> > > > > > to previous error (AE_AML_LOOP_TIMEOUT) (20190816/psparse-529)
> > > > > > [  752.866508] ACPI Error: Aborting method \_SB.PCI0.PGON due to
> > > > > > previous error (AE_AML_LOOP_TIMEOUT) (20190816/psparse-529)
> > > > > > [  752.866521] ACPI Error: Aborting method \_SB.PCI0.PEG0.PG00._ON due
> > > > > > to previous error (AE_AML_LOOP_TIMEOUT) (20190816/psparse-529)
> > > > >
> > > > > This is probably the culprit. The same AML code fails to properly turn
> > > > > on the device.
> > > > >
> > > > > Is acpidump from this system available somewhere?
> > >
> > > Attached it to this email
> > >
> > > > >
> > > --
> > > Cheers,
> > >         Lyude Paul
> >
>


  reply	other threads:[~2019-12-09 12:24 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-17 12:19 [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges Karol Herbst
2019-11-14 19:17 ` Karol Herbst
2019-11-19 20:06 ` Dave Airlie
2019-11-19 21:49 ` Bjorn Helgaas
2019-11-19 22:26   ` Karol Herbst
2019-11-19 22:57     ` Bjorn Helgaas
2019-11-20 10:18     ` Mika Westerberg
2019-11-20 10:52       ` Rafael J. Wysocki
2019-11-20 11:22         ` Mika Westerberg
2019-11-20 11:48           ` Rafael J. Wysocki
2019-11-20 11:51             ` Karol Herbst
2019-11-20 12:06               ` Rafael J. Wysocki
2019-11-20 12:09                 ` Karol Herbst
2019-11-20 12:14                   ` Rafael J. Wysocki
2019-11-20 12:19                     ` Karol Herbst
2019-11-20 12:11                 ` Rafael J. Wysocki
2019-11-20 11:51           ` Mika Westerberg
2019-11-20 11:54             ` Karol Herbst
2019-11-20 11:58               ` Karol Herbst
2019-11-20 12:09                 ` Mika Westerberg
2019-11-20 12:11                   ` Karol Herbst
2019-11-20 15:15                     ` Mika Westerberg
2019-11-20 15:37                       ` Karol Herbst
2019-11-20 15:53                         ` Mika Westerberg
2019-11-20 16:23                           ` Mika Westerberg
2019-11-20 21:36                             ` Karol Herbst
2019-11-21 10:14                               ` Mika Westerberg
2019-11-21 11:03                                 ` Rafael J. Wysocki
2019-11-21 11:08                                   ` Rafael J. Wysocki
2019-11-21 11:15                                     ` Rafael J. Wysocki
2019-11-21 11:17                                   ` Mika Westerberg
2019-11-21 11:31                                     ` Rafael J. Wysocki
2019-11-20 21:37                           ` Rafael J. Wysocki
2019-11-20 21:40                             ` Karol Herbst
2019-11-20 22:29                               ` Rafael J. Wysocki
2019-11-21 11:28                                 ` Mika Westerberg
2019-11-21 11:34                                   ` Rafael J. Wysocki
2019-11-21 11:46                                     ` Mika Westerberg
2019-11-21 12:52                                       ` Mika Westerberg
2019-11-21 12:56                                         ` Karol Herbst
2019-11-21 15:43                                         ` Rafael J. Wysocki
2019-11-21 19:49                                           ` Mika Westerberg
2019-11-21 22:39                                             ` Rafael J. Wysocki
2019-11-21 22:50                                               ` Karol Herbst
2019-11-22  0:13                                                 ` Karol Herbst
2019-11-22  9:07                                                   ` Rafael J. Wysocki
2019-11-22 11:30                                                     ` Karol Herbst
2019-11-22 10:36                                               ` Mika Westerberg
2019-11-22 11:30                                                 ` Rafael J. Wysocki
2019-11-22 11:34                                                   ` Karol Herbst
2019-11-22 11:54                                                     ` Rafael J. Wysocki
2019-11-22 11:52                                                   ` Mika Westerberg
2019-11-22 12:15                                                     ` Rafael J. Wysocki
2019-11-21 12:52                                       ` Karol Herbst
2019-11-21 15:47                                         ` Rafael J. Wysocki
2019-11-21 16:06                                           ` Karol Herbst
2019-11-21 16:39                                             ` Rafael J. Wysocki
2019-11-26 23:10                                               ` Lyude Paul
2019-11-27 11:48                                                 ` Mika Westerberg
2019-11-27 11:51                                                   ` Karol Herbst
2019-11-27 19:51                                                     ` Lyude Paul
2019-12-09 11:17                                                       ` Karol Herbst
2019-12-09 11:38                                                         ` Rafael J. Wysocki
2019-12-09 12:24                                                           ` Karol Herbst [this message]
2019-12-10 19:58                                                           ` Dave Airlie
2019-12-10 20:49                                                             ` Karol Herbst
2020-01-13 15:31                                                               ` Karol Herbst

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACO55tu19-14nVnnCpWz3r3nf15j6tGWzHNBRmbbs2R6O4gMCA@mail.gmail.com \
    --to=kherbst@redhat.com \
    --cc=Mario.Limonciello@dell.com \
    --cc=airlied@gmail.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=helgaas@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=lyude@redhat.com \
    --cc=mika.westerberg@intel.com \
    --cc=nouveau@lists.freedesktop.org \
    --cc=rafael@kernel.org \
    --cc=rjw@rjwysocki.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).