linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Wu <peter@lekensteyn.nl>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Roland Singer <roland.singer@desertbit.com>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-acpi@vger.kernel.org, dri-devel@lists.freedesktop.org
Subject: Re: Kernel Freeze with American Megatrends BIOS
Date: Tue, 30 Aug 2016 21:53:37 +0200	[thread overview]
Message-ID: <20160830195337.GA18805@al> (raw)
In-Reply-To: <20160829160210.GA24451@localhost>

On Mon, Aug 29, 2016 at 11:02:10AM -0500, Bjorn Helgaas wrote:
> [+cc linux-acpi, linux-kernel, dri-devel]
> 
> Hi Roland,
> 
> I have no idea how to debug this problem.  Are you seeing something
> that suggests it may be a PCI problem?

Yes I suspect there is an ACPI and/ or PCI problem, possibly
device-specific. Steps to reproduce on the affected machines:

 1. Load nouveau.
 2. Wait for it to runtime suspend.
 2. Invoke 'lspci', this resumes the Nvidia PCI device via nouveau.
 3. lspci never returns, few moments later an AML_INFINITE_LOOP is
    reported.

If you use the external bbswitch module, the effect is the same. I have
been trying to debug this for some time on nouveau with no luck. The
PCI/PM D3cold patches from Mika makes no difference.

Runtime resume via nouveau triggers some ACPI methods (I'll assume the
Windows 8-style PR method and take the Clevo P651 as example):

    \_SB.PCI0.PEG0.PG00._ON () ->
        \_SB.PCI0.PGON (0)

Then:

    Method (PGON, 1, Serialized) {
        PION = Arg0     // note: 0 for PG00
        // ...
        If ((OSYS != 0x07DF)) { /* Not Windows 2015 (Windows 10), see below */ }
        Else {
            LKEN (PION)
        }
        // this is the infinite loop: it tries to bring the PCIe link to
        // full speed, but fails to do so.
        While ((\_SB.PCI0.PEG0.LNKS < 0x07)) {
            Local0 = 0x20
            While (Local0) {
                If ((\_SB.PCI0.PEG0.LNKS < 0x07)) {
                    Stall (0x64)
                    Local0--
                } Else { Break }
            }
            If ((Local0 == Zero)) {
                \_SB.PCI0.PEG0.RTLK = One
                Stall (0x64)
            }
        }
        // ...
    }

Without any workaround, this piece of code is invoked:

    Method (LKEN, 1, NotSerialized) {
        Local3 = (CPEX & 0x0F)  // CPEX at 0x5ff9be7f and has value 000506e3
        If ((Local3 == Zero)) {
            /* Similar to below, but with Q0L0 -> P0L0 (register 0xBC bit 6) */
        } ElseIf ((Local3 != Zero)) {
            If ((Arg0 == Zero)) {
                /* Enter L0 Activate state.
                 * (LKDS tries to enter L2, deep-energy-saving state.) */
                Q0L0 = One      // register 0x249 bit 0; \_SB.PCI0.OPG0.Q0L0 00:01.0
                Sleep (0x10)
                Local0 = Zero
                While (Q0L0) {
                    If ((Local0 > 0x04)) { Break }
                    Sleep (0x10)
                    Local0++
                }
            } else { /* other cases, but we are only interested in PGON(0) */ }
        }
    }

The acpi_osi="!Windows 2015" workaround will invoke this instead:

    If ((OSYS != 0x07DF)) {
        If ((PION == Zero)) {
            P0AP = Zero  /* PGOF writes 3 */
            P0RM = Zero  /* PGOF writes 1 */
        }
        If ((PBGE != Zero)) { /* Observed to be false (PBGE == 0) */
            If (SBDL (PION)) {
                PUAB (PION)
                CBDL = GUBC (PION)
                MBDL = GMXB (PION)
                If ((CBDL > MBDL)) {
                    CBDL = MBDL /* \_SB_.PCI0.MBDL */
                }
                PDUB (PION, CBDL)
            }
        }
        If ((PION == Zero)) {
            P0LD = Zero     /* Link Disable = 0, PGOF sets 1 instead. */
            P0TR = One      /* Train? (PGOF does not set this). */
            TCNT = Zero
            While ((TCNT < LDLY)) { /* LDLY = 300 */
                If ((P0VC == Zero)) {
                    /* VC Negotiation Pending 0 means VC negotation is complete. */
                    Break
                }
                Sleep (0x10)
                TCNT += 0x10 /* At most 19 iterations, sleeping for 304ms. */
            }
        }
    }

The comments above are my own interpretation based on the acpidumps I
extracted from the machine. These notes and ACPI tables can be found at
https://github.com/Lekensteyn/acpi-stuff/blob/master/Clevo-P651RA/notes.txt
https://github.com/Lekensteyn/acpi-stuff/tree/master/dsl/Clevo_P651RA

Other affected devices have similar code, differences are small:
 - No check for LNKS (avoids the infinite loop, but device is still off)
 - Instead of a check for != "Windows 2015", they check for == "Windows
   2009" or even for == "Windows 2009" || "Windows 2013" (Dell Inspiron
   7559).

The tested kernels (with bbswitch or nouveau) were Linux 4.4.0, 4.6,
4.7 (nouveau + PCI/PM + nouveau PR patches). The PCIe device is
something from the GTX 9xxM family in all cases.

I have a bunch of PCI config dumps from Windows and Linux, but there is
nothing extraordinary. Also did an ACPI trace via a Checked/Debug build
of Windows, but it just confirms that the ACPI method we use for the
Nvidia device is the correct one.

Let me know if you need more information, I would be glad to provide.

Kind regards,
Peter

> On Tue, Aug 23, 2016 at 11:23:45AM +0200, Roland Singer wrote:
> > Hi,
> > 
> > hope somebody can help me fix this kernel problem which affects the following machines:
> > 
> > - Clevo P651RA (i7-6700HQ/GTX 965M, part of the P6xxRx family which are also affected)
> > - MSI GE62 Apache Pro (i7-6700HQ/GTX 960M)
> > - Gigabyte P35V5 (i7-6700HQ/GTX 970M)
> > - Razer Blade 14" (2016) (i7-6700HQ/GTX 970M) (BIOS 5.11, 04/07/2016)
> > 
> > 
> > The kernel freezes if the graphical user session (Xorg & Wayland) is
> > started with a switched off discrete GPU card (NVIDIA).
> > If the discrete GPU is switched off after the graphical session start,
> > then everything works as expected, until the graphical session is restarted.
> > 
> > This problem seams to be linked to specific BIOS settings. If the computer
> > is started with the following command line:
> > 
> > acpi_osi=! acpi_osi="Windows 2009"
> > 
> > then the kernel freeze does not occur anymore. However this required a special
> > ACPI DSDT firmware patch for the Razer Blade 2016 laptop:
> > 
> > https://github.com/m4ng0squ4sh/razer_blade_14_2016_acpi_dsdt
> > 
> > I strongly recommend to fix this in the kernel and I am ready to help and solve
> > this problem with some help.
> > 
> > Here is a link to the GitHub issue with further information:
> > 
> > https://github.com/Bumblebee-Project/Bumblebee/issues/764#issuecomment-241212595
> > 
> > Here are some more detailed information:
> > 
> > https://github.com/Lekensteyn/acpi-stuff/blob/master/Clevo-P651RA/notes.txt
> > 
> > Hope somebody can help.

  parent reply	other threads:[~2016-08-30 19:53 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <004c7dbe-2014-c691-29d1-7a45f3b73dfa@desertbit.com>
2016-08-29 16:02 ` Kernel Freeze with American Megatrends BIOS Bjorn Helgaas
2016-08-29 18:46   ` Roland Singer
2016-08-29 19:07     ` Bjorn Helgaas
2016-08-29 19:55       ` Roland Singer
2016-08-29 23:54         ` Bjorn Helgaas
2016-08-30 10:08           ` Roland Singer
2016-08-30 13:06             ` Bjorn Helgaas
2016-08-30 14:08               ` Emil Velikov
2016-08-30 15:25                 ` Roland Singer
2016-08-30 15:44                   ` Ilia Mirkin
2016-08-30 15:48                     ` Ilia Mirkin
2016-08-30 15:48                   ` Emil Velikov
2016-08-30 17:37                     ` Roland Singer
2016-08-30 17:43                       ` Ilia Mirkin
2016-08-30 18:02                         ` Roland Singer
2016-08-30 18:13                           ` Ilia Mirkin
2016-08-30 19:21                             ` Peter Wu
2016-08-31 11:12                               ` Roland Singer
2016-08-31 11:11                             ` Roland Singer
2016-08-30 18:09                       ` Emil Velikov
2016-08-30 18:10                         ` Emil Velikov
2016-08-31 10:51                           ` Roland Singer
2016-08-30 19:53   ` Peter Wu [this message]
2016-08-31 11:27     ` Roland Singer
2016-08-31 11:46       ` Peter Wu
2016-08-31 12:21         ` Roland Singer
2016-08-31 12:34           ` Peter Wu
2016-08-31 13:13             ` Roland Singer
2016-08-31 20:06               ` Roland Singer
2016-08-31 20:16                 ` Roland Singer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160830195337.GA18805@al \
    --to=peter@lekensteyn.nl \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=helgaas@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=roland.singer@desertbit.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).