All of lore.kernel.org
 help / color / mirror / Atom feed
From: Karol Herbst <kherbst@redhat.com>
To: Linux PCI <linux-pci@vger.kernel.org>,
	Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ben Skeggs <bskeggs@redhat.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Lyude Paul <lyude@redhat.com>,
	nouveau <nouveau@lists.freedesktop.org>,
	dri-devel <dri-devel@lists.freedesktop.org>
Subject: Re: nouveau regression with 5.7 caused by "PCI/PM: Assume ports without DLL Link Active train links in 100 ms"
Date: Fri, 17 Jul 2020 00:10:39 +0200	[thread overview]
Message-ID: <CACO55tuA+XMgv=GREf178NzTLTHri4kyD5mJjKuDpKxExauvVg@mail.gmail.com> (raw)
In-Reply-To: <CACO55tsAEa5GXw5oeJPG=mcn+qxNvspXreJYWDJGZBy5v82JDA@mail.gmail.com>

On Tue, Jul 7, 2020 at 9:30 PM Karol Herbst <kherbst@redhat.com> wrote:
>
> Hi everybody,
>
> with the mentioned commit Nouveau isn't able to load firmware onto the
> GPU on one of my systems here. Even though the issue doesn't always
> happen I am quite confident this is the commit breaking it.
>
> I am still digging into the issue and trying to figure out what
> exactly breaks, but it shows up in different ways. Either we are not
> able to boot the engines on the GPU or the GPU becomes unresponsive.
> Btw, this is also a system where our runtime power management issue
> shows up, so maybe there is indeed something funky with the bridge
> controller.
>
> Just pinging you in case you have an idea on how this could break Nouveau
>
> most of the times it shows up like this:
> nouveau 0000:01:00.0: acr: AHESASC binary failed
>
> Sometimes it works at boot and fails at runtime resuming with random
> faults. So I will be investigating a bit more, but yeah... I am super
> sure the commit triggered this issue, no idea if it actually causes
> it.
>

so yeah.. I reverted that locally and never ran into issues again.
Still valid on latest 5.7. So can we get this reverted or properly
fixed? This breaks runtime pm for us on at least some hardware.

> git bisect log (had to do a second bisect, that's why the first bad
> and good commits appear a bit random):
>
> git bisect start
> # bad: [a92b984a110863b42a3abf32e3f049b02b19e350] clk: samsung:
> exynos5433: Add IGNORE_UNUSED flag to sclk_i2s1
> git bisect bad a92b984a110863b42a3abf32e3f049b02b19e350
> # good: [4da858c086433cd012c0bb16b5921f6fafe3f803] Merge branch
> 'linux-5.7' of git://github.com/skeggsb/linux into drm-fixes
> git bisect good 4da858c086433cd012c0bb16b5921f6fafe3f803
> # good: [d5dfe4f1b44ed532653c2335267ad9599c8a698e] Merge tag
> 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
> git bisect good d5dfe4f1b44ed532653c2335267ad9599c8a698e
> # good: [b24e451cfb8c33ef5b8b4a80e232706b089914fb] ipv6: fix
> IPV6_ADDRFORM operation logic
> git bisect good b24e451cfb8c33ef5b8b4a80e232706b089914fb
> # good: [d843ffbce812742986293f974d55ba404e91872f] nvmet: fix memory
> leak when removing namespaces and controllers concurrently
> git bisect good d843ffbce812742986293f974d55ba404e91872f
> # good: [be66f10a60e3ec0b589898f78a428bcb34095730] staging: wfx: fix
> output of rx_stats on big endian hosts
> git bisect good be66f10a60e3ec0b589898f78a428bcb34095730
> # good: [a4482984c41f5cc1d217aa189fe51bbbc0500f98] s390/qdio:
> consistently restore the IRQ handler
> git bisect good a4482984c41f5cc1d217aa189fe51bbbc0500f98
> # good: [bec32a54a4de62b46466f4da1beb9ddd42db81b8] f2fs: fix potential
> use-after-free issue
> git bisect good bec32a54a4de62b46466f4da1beb9ddd42db81b8
> # bad: [044aaaa8b1b15adb397ce423a6d97920a46b3893] habanalabs: increase
> timeout during reset
> git bisect bad 044aaaa8b1b15adb397ce423a6d97920a46b3893
> # good: [6fe8ed270763a6a2e350bf37eee0f3857482ed48] arm64: dts: qcom:
> db820c: Fix invalid pm8994 supplies
> git bisect good 6fe8ed270763a6a2e350bf37eee0f3857482ed48
> # good: [363e8bfc96b4e9d9e0a885408cecaf23df468523] tty: n_gsm: Fix
> waking up upper tty layer when room available
> git bisect good 363e8bfc96b4e9d9e0a885408cecaf23df468523
> # bad: [afaff825e3a436f9d1e3986530133b1c91b54cd1] PCI/PM: Assume ports
> without DLL Link Active train links in 100 ms
> git bisect bad afaff825e3a436f9d1e3986530133b1c91b54cd1
> # good: [be0ed15d88c65de0e28ff37a3b242e65a782fd98] HID: Add quirks for
> Trust Panora Graphic Tablet
> git bisect good be0ed15d88c65de0e28ff37a3b242e65a782fd98
> # first bad commit: [afaff825e3a436f9d1e3986530133b1c91b54cd1] PCI/PM:
> Assume ports without DLL Link Active train links in 100 ms


WARNING: multiple messages have this Message-ID (diff)
From: Karol Herbst <kherbst@redhat.com>
To: Linux PCI <linux-pci@vger.kernel.org>,
	 Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	nouveau <nouveau@lists.freedesktop.org>,
	Ben Skeggs <bskeggs@redhat.com>,
	dri-devel <dri-devel@lists.freedesktop.org>
Subject: Re: nouveau regression with 5.7 caused by "PCI/PM: Assume ports without DLL Link Active train links in 100 ms"
Date: Fri, 17 Jul 2020 00:10:39 +0200	[thread overview]
Message-ID: <CACO55tuA+XMgv=GREf178NzTLTHri4kyD5mJjKuDpKxExauvVg@mail.gmail.com> (raw)
In-Reply-To: <CACO55tsAEa5GXw5oeJPG=mcn+qxNvspXreJYWDJGZBy5v82JDA@mail.gmail.com>

On Tue, Jul 7, 2020 at 9:30 PM Karol Herbst <kherbst@redhat.com> wrote:
>
> Hi everybody,
>
> with the mentioned commit Nouveau isn't able to load firmware onto the
> GPU on one of my systems here. Even though the issue doesn't always
> happen I am quite confident this is the commit breaking it.
>
> I am still digging into the issue and trying to figure out what
> exactly breaks, but it shows up in different ways. Either we are not
> able to boot the engines on the GPU or the GPU becomes unresponsive.
> Btw, this is also a system where our runtime power management issue
> shows up, so maybe there is indeed something funky with the bridge
> controller.
>
> Just pinging you in case you have an idea on how this could break Nouveau
>
> most of the times it shows up like this:
> nouveau 0000:01:00.0: acr: AHESASC binary failed
>
> Sometimes it works at boot and fails at runtime resuming with random
> faults. So I will be investigating a bit more, but yeah... I am super
> sure the commit triggered this issue, no idea if it actually causes
> it.
>

so yeah.. I reverted that locally and never ran into issues again.
Still valid on latest 5.7. So can we get this reverted or properly
fixed? This breaks runtime pm for us on at least some hardware.

> git bisect log (had to do a second bisect, that's why the first bad
> and good commits appear a bit random):
>
> git bisect start
> # bad: [a92b984a110863b42a3abf32e3f049b02b19e350] clk: samsung:
> exynos5433: Add IGNORE_UNUSED flag to sclk_i2s1
> git bisect bad a92b984a110863b42a3abf32e3f049b02b19e350
> # good: [4da858c086433cd012c0bb16b5921f6fafe3f803] Merge branch
> 'linux-5.7' of git://github.com/skeggsb/linux into drm-fixes
> git bisect good 4da858c086433cd012c0bb16b5921f6fafe3f803
> # good: [d5dfe4f1b44ed532653c2335267ad9599c8a698e] Merge tag
> 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
> git bisect good d5dfe4f1b44ed532653c2335267ad9599c8a698e
> # good: [b24e451cfb8c33ef5b8b4a80e232706b089914fb] ipv6: fix
> IPV6_ADDRFORM operation logic
> git bisect good b24e451cfb8c33ef5b8b4a80e232706b089914fb
> # good: [d843ffbce812742986293f974d55ba404e91872f] nvmet: fix memory
> leak when removing namespaces and controllers concurrently
> git bisect good d843ffbce812742986293f974d55ba404e91872f
> # good: [be66f10a60e3ec0b589898f78a428bcb34095730] staging: wfx: fix
> output of rx_stats on big endian hosts
> git bisect good be66f10a60e3ec0b589898f78a428bcb34095730
> # good: [a4482984c41f5cc1d217aa189fe51bbbc0500f98] s390/qdio:
> consistently restore the IRQ handler
> git bisect good a4482984c41f5cc1d217aa189fe51bbbc0500f98
> # good: [bec32a54a4de62b46466f4da1beb9ddd42db81b8] f2fs: fix potential
> use-after-free issue
> git bisect good bec32a54a4de62b46466f4da1beb9ddd42db81b8
> # bad: [044aaaa8b1b15adb397ce423a6d97920a46b3893] habanalabs: increase
> timeout during reset
> git bisect bad 044aaaa8b1b15adb397ce423a6d97920a46b3893
> # good: [6fe8ed270763a6a2e350bf37eee0f3857482ed48] arm64: dts: qcom:
> db820c: Fix invalid pm8994 supplies
> git bisect good 6fe8ed270763a6a2e350bf37eee0f3857482ed48
> # good: [363e8bfc96b4e9d9e0a885408cecaf23df468523] tty: n_gsm: Fix
> waking up upper tty layer when room available
> git bisect good 363e8bfc96b4e9d9e0a885408cecaf23df468523
> # bad: [afaff825e3a436f9d1e3986530133b1c91b54cd1] PCI/PM: Assume ports
> without DLL Link Active train links in 100 ms
> git bisect bad afaff825e3a436f9d1e3986530133b1c91b54cd1
> # good: [be0ed15d88c65de0e28ff37a3b242e65a782fd98] HID: Add quirks for
> Trust Panora Graphic Tablet
> git bisect good be0ed15d88c65de0e28ff37a3b242e65a782fd98
> # first bad commit: [afaff825e3a436f9d1e3986530133b1c91b54cd1] PCI/PM:
> Assume ports without DLL Link Active train links in 100 ms

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  reply	other threads:[~2020-07-16 22:10 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-07 19:30 nouveau regression with 5.7 caused by "PCI/PM: Assume ports without DLL Link Active train links in 100 ms" Karol Herbst
2020-07-16 22:10 ` Karol Herbst [this message]
2020-07-16 22:10   ` Karol Herbst
2020-07-16 23:54   ` Bjorn Helgaas
2020-07-16 23:54     ` Bjorn Helgaas
2020-07-17  0:43     ` Karol Herbst
2020-07-17  0:43       ` Karol Herbst
2020-07-17  0:43       ` Karol Herbst
2020-07-17 11:32       ` Karol Herbst
2020-07-17 11:32         ` Karol Herbst
2020-07-21 12:22         ` Mika Westerberg
2020-07-21 12:22           ` Mika Westerberg
2020-07-21 15:01           ` Lyude Paul
2020-07-21 15:01             ` Lyude Paul
2020-07-21 15:27             ` Mika Westerberg
2020-07-21 15:27               ` Mika Westerberg
2020-07-21 16:00               ` Lyude Paul
2020-07-21 16:00                 ` Lyude Paul
2020-07-21 18:24                 ` Lyude Paul
2020-07-21 18:24                   ` Lyude Paul
2020-07-21 18:24                   ` Lyude Paul
2020-07-22  9:23                   ` Mika Westerberg
2020-07-22  9:23                     ` Mika Westerberg
2020-07-21 18:37               ` Patrick Volkerding
2020-07-21 18:37                 ` Patrick Volkerding
2020-07-22  9:25                 ` Mika Westerberg
2020-07-22  9:25                   ` Mika Westerberg
2020-07-22  9:25                   ` Mika Westerberg
2020-07-23 20:30                   ` Karol Herbst
2020-07-23 20:30                     ` Karol Herbst
2020-07-24  9:57                     ` Mika Westerberg
2020-07-24  9:57                       ` Mika Westerberg
2020-07-24 14:35                       ` Bjorn Helgaas
2020-07-24 14:35                         ` Bjorn Helgaas
2020-07-17 14:43       ` Sasha Levin
2020-07-17 14:43         ` Sasha Levin
2020-07-17 22:59         ` Bjorn Helgaas
2020-07-17 22:59           ` Bjorn Helgaas
2020-07-17 19:04     ` Lyude Paul
2020-07-17 19:04       ` Lyude Paul
2020-07-17 19:04       ` Lyude Paul
2020-07-17 19:52       ` Lukas Wunner
2020-07-17 19:52         ` Lukas Wunner
2020-07-21 13:08         ` Mika Westerberg
2020-07-21 13:08           ` Mika Westerberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACO55tuA+XMgv=GREf178NzTLTHri4kyD5mJjKuDpKxExauvVg@mail.gmail.com' \
    --to=kherbst@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=bskeggs@redhat.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lyude@redhat.com \
    --cc=mika.westerberg@linux.intel.com \
    --cc=nouveau@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.