linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* regression: PCIe resume from suspend stalls I/O and causes interrupt storms in Linux 5.3-rc2 (5.2.5, 5.1.20) on Ryzen 7 1700/AMD X370 MSI board since 5817d78eba34f6c86f5462ae2c5212f80a013357, 5.2/5.3 w/ pcieIRQ loop.
@ 2019-08-02 19:03 Matthias Andree
  2019-08-05 12:27 ` Mika Westerberg
  0 siblings, 1 reply; 7+ messages in thread
From: Matthias Andree @ 2019-08-02 19:03 UTC (permalink / raw)
  To: linux-pci; +Cc: Mika Westerberg, Rafael J. Wysocki, Bjorn Helgaas

Greetings, 

Commit 5817d78eba34f6c86f5462ae2c5212f80a013357 (written by Mika
Westerberg) causes regressions on resume from S3 suspend on my MSI X370
w/ Ryzen 7 1700, which is, TTBOMK, a PCI Express 3.0 platform.
Consequences are hung disk and net I/O although re-login to GNOME works
on 5.1.20, albeit very slowly. The machine is unusable after resume from
that point.

5.2.5 and 5.3-rc2 will go into a tight loop of pcieport 0000:00:01.3:
PME: Spurious native interrupt! and need to be rebooted.

bad: v5.3-rc2

good: v5.3-rc2-111-g97b00aff2c45 + "git revert 5817d78eba"

Reverting that commit shown above restores suspend functionality for me,
two S3 suspend/resume cycles work.

For details, more information (lspci, versions found) is at:

* Kernel Bugzilla, https://bugzilla.kernel.org/show_bug.cgi?id=204413

* Fedora/Redhat Bugzilla,
https://bugzilla.redhat.com/show_bug.cgi?id=1737046


Same findings for v5.2.5 on stable kernel, reverting the relevant commit
(SHA is 5817d78eba34f6c86f5462ae2c5212f80a013357 there) also fixes
suspend/resume problems for me.

Let me know if you need me to pull out any further hardware or kernel
debug info, but please be specific with instructions - I am not a kernel
hacker (although I have been exposed to C for nearly 30 years and
Linux/FreeBSD for some 20 years). Pointing me to relevant URLs with
debug instructions is fine. I have a Git tree handy and this octocore
sitting here compiles a kernel in < 10 minutes.

Regards,

Matthias



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: regression: PCIe resume from suspend stalls I/O and causes interrupt storms in Linux 5.3-rc2 (5.2.5, 5.1.20) on Ryzen 7 1700/AMD X370 MSI board since 5817d78eba34f6c86f5462ae2c5212f80a013357, 5.2/5.3 w/ pcieIRQ loop.
  2019-08-02 19:03 regression: PCIe resume from suspend stalls I/O and causes interrupt storms in Linux 5.3-rc2 (5.2.5, 5.1.20) on Ryzen 7 1700/AMD X370 MSI board since 5817d78eba34f6c86f5462ae2c5212f80a013357, 5.2/5.3 w/ pcieIRQ loop Matthias Andree
@ 2019-08-05 12:27 ` Mika Westerberg
  2019-08-05 12:47   ` Bjorn Helgaas
  0 siblings, 1 reply; 7+ messages in thread
From: Mika Westerberg @ 2019-08-05 12:27 UTC (permalink / raw)
  To: Matthias Andree; +Cc: linux-pci, Rafael J. Wysocki, Bjorn Helgaas

On Fri, Aug 02, 2019 at 09:03:06PM +0200, Matthias Andree wrote:
> Greetings, 

Hi,

> Commit 5817d78eba34f6c86f5462ae2c5212f80a013357 (written by Mika
> Westerberg) causes regressions on resume from S3 suspend on my MSI X370
> w/ Ryzen 7 1700, which is, TTBOMK, a PCI Express 3.0 platform.
> Consequences are hung disk and net I/O although re-login to GNOME works
> on 5.1.20, albeit very slowly. The machine is unusable after resume from
> that point.
> 
> 5.2.5 and 5.3-rc2 will go into a tight loop of pcieport 0000:00:01.3:
> PME: Spurious native interrupt! and need to be rebooted.
> 
> bad: v5.3-rc2
> 
> good: v5.3-rc2-111-g97b00aff2c45 + "git revert 5817d78eba"
> 
> Reverting that commit shown above restores suspend functionality for me,
> two S3 suspend/resume cycles work.
> 
> For details, more information (lspci, versions found) is at:
> 
> * Kernel Bugzilla, https://bugzilla.kernel.org/show_bug.cgi?id=204413
> 
> * Fedora/Redhat Bugzilla,
> https://bugzilla.redhat.com/show_bug.cgi?id=1737046
> 
> 
> Same findings for v5.2.5 on stable kernel, reverting the relevant commit
> (SHA is 5817d78eba34f6c86f5462ae2c5212f80a013357 there) also fixes
> suspend/resume problems for me.
> 
> Let me know if you need me to pull out any further hardware or kernel
> debug info, but please be specific with instructions - I am not a kernel
> hacker (although I have been exposed to C for nearly 30 years and
> Linux/FreeBSD for some 20 years). Pointing me to relevant URLs with
> debug instructions is fine. I have a Git tree handy and this octocore
> sitting here compiles a kernel in < 10 minutes.

Are you able to get dmesg after resume or is it completely dead? It
would help you we could see how long it tries to wait for the downstream
link by passing "pciepordrv.dyndbg" to the kernel command line.

Can you also try to revert 00ebf1348cb332941dab52948f29480592bfbe6a
("PCI/PME: Replace dev_printk(KERN_DEBUG) with dev_info()") so that it
does not spam dmesg too much?

Thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: regression: PCIe resume from suspend stalls I/O and causes interrupt storms in Linux 5.3-rc2 (5.2.5, 5.1.20) on Ryzen 7 1700/AMD X370 MSI board since 5817d78eba34f6c86f5462ae2c5212f80a013357, 5.2/5.3 w/ pcieIRQ loop.
  2019-08-05 12:27 ` Mika Westerberg
@ 2019-08-05 12:47   ` Bjorn Helgaas
  2019-08-05 13:00     ` Mika Westerberg
  0 siblings, 1 reply; 7+ messages in thread
From: Bjorn Helgaas @ 2019-08-05 12:47 UTC (permalink / raw)
  To: Mika Westerberg; +Cc: Matthias Andree, linux-pci, Rafael J. Wysocki

On Mon, Aug 05, 2019 at 03:27:51PM +0300, Mika Westerberg wrote:
> Are you able to get dmesg after resume or is it completely dead? It
> would help you we could see how long it tries to wait for the downstream
> link by passing "pciepordrv.dyndbg" to the kernel command line.

"pcieportdrv.dyndbg" (with "t"), I think.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: regression: PCIe resume from suspend stalls I/O and causes interrupt storms in Linux 5.3-rc2 (5.2.5, 5.1.20) on Ryzen 7 1700/AMD X370 MSI board since 5817d78eba34f6c86f5462ae2c5212f80a013357, 5.2/5.3 w/ pcieIRQ loop.
  2019-08-05 12:47   ` Bjorn Helgaas
@ 2019-08-05 13:00     ` Mika Westerberg
  2019-08-05 14:01       ` Matthias Andree
  0 siblings, 1 reply; 7+ messages in thread
From: Mika Westerberg @ 2019-08-05 13:00 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Matthias Andree, linux-pci, Rafael J. Wysocki

On Mon, Aug 05, 2019 at 07:47:05AM -0500, Bjorn Helgaas wrote:
> On Mon, Aug 05, 2019 at 03:27:51PM +0300, Mika Westerberg wrote:
> > Are you able to get dmesg after resume or is it completely dead? It
> > would help you we could see how long it tries to wait for the downstream
> > link by passing "pciepordrv.dyndbg" to the kernel command line.
> 
> "pcieportdrv.dyndbg" (with "t"), I think.

Right, thanks for the correction.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: regression: PCIe resume from suspend stalls I/O and causes interrupt storms in Linux 5.3-rc2 (5.2.5, 5.1.20) on Ryzen 7 1700/AMD X370 MSI board since 5817d78eba34f6c86f5462ae2c5212f80a013357, 5.2/5.3 w/ pcieIRQ loop.
  2019-08-05 13:00     ` Mika Westerberg
@ 2019-08-05 14:01       ` Matthias Andree
  2019-08-05 15:08         ` Mika Westerberg
  0 siblings, 1 reply; 7+ messages in thread
From: Matthias Andree @ 2019-08-05 14:01 UTC (permalink / raw)
  To: Mika Westerberg, Bjorn Helgaas; +Cc: linux-pci, Rafael J. Wysocki

Am 05.08.19 um 15:00 schrieb Mika Westerberg:
> On Mon, Aug 05, 2019 at 07:47:05AM -0500, Bjorn Helgaas wrote:
>> On Mon, Aug 05, 2019 at 03:27:51PM +0300, Mika Westerberg wrote:
>>> Are you able to get dmesg after resume or is it completely dead? It
>>> would help you we could see how long it tries to wait for the downstream
>>> link by passing "pciepordrv.dyndbg" to the kernel command line.
>> "pcieportdrv.dyndbg" (with "t"), I think.
> Right, thanks for the correction.

Hi Mika, Bjorn,

thanks for picking this up. I have used pcieportdrv.dyndbg=+p (not sure
if something else would have worked) and was lucky that I could transfer
dmesg to a USB stick before the machine went completely unusable.

$ grep pcieport dmesg.txt

[    0.698739] pcieport 0000:00:01.3: Signaling PME with IRQ 28
[    0.698799] pcieport 0000:00:01.3: AER: enabled with IRQ 28
[    0.698966] pcieport 0000:00:03.1: Signaling PME with IRQ 29
[    0.699017] pcieport 0000:00:03.1: AER: enabled with IRQ 29
[    0.699188] pcieport 0000:00:07.1: Signaling PME with IRQ 30
[    0.699230] pcieport 0000:00:07.1: AER: enabled with IRQ 30
[    0.699816] pcieport 0000:00:08.1: Signaling PME with IRQ 31
[    0.699860] pcieport 0000:00:08.1: AER: enabled with IRQ 31
[  119.637492] pcieport 0000:00:03.1: waiting downstream link for 100 ms
[  119.649285] pcieport 0000:00:08.1: waiting downstream link for 100 ms
[  119.649287] pcieport 0000:00:07.1: waiting downstream link for 100 ms
[  119.649376] pcieport 0000:00:01.3: waiting downstream link for 100 ms
[  119.803025] pcieport 0000:16:08.0: waiting downstream link for 100 ms
[  119.803031] pcieport 0000:16:01.0: waiting downstream link for 100 ms

sudo lspci -vv # uploaded to Kernel bug, see

(deep link:) https://bugzilla.kernel.org/attachment.cgi?id=284193

(bug:) https://bugzilla.kernel.org/show_bug.cgi?id=204413


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: regression: PCIe resume from suspend stalls I/O and causes interrupt storms in Linux 5.3-rc2 (5.2.5, 5.1.20) on Ryzen 7 1700/AMD X370 MSI board since 5817d78eba34f6c86f5462ae2c5212f80a013357, 5.2/5.3 w/ pcieIRQ loop.
  2019-08-05 14:01       ` Matthias Andree
@ 2019-08-05 15:08         ` Mika Westerberg
  2019-08-05 15:56           ` Matthias Andree
  0 siblings, 1 reply; 7+ messages in thread
From: Mika Westerberg @ 2019-08-05 15:08 UTC (permalink / raw)
  To: Matthias Andree; +Cc: Bjorn Helgaas, linux-pci, Rafael J. Wysocki

On Mon, Aug 05, 2019 at 04:01:11PM +0200, Matthias Andree wrote:
> Am 05.08.19 um 15:00 schrieb Mika Westerberg:
> > On Mon, Aug 05, 2019 at 07:47:05AM -0500, Bjorn Helgaas wrote:
> >> On Mon, Aug 05, 2019 at 03:27:51PM +0300, Mika Westerberg wrote:
> >>> Are you able to get dmesg after resume or is it completely dead? It
> >>> would help you we could see how long it tries to wait for the downstream
> >>> link by passing "pciepordrv.dyndbg" to the kernel command line.
> >> "pcieportdrv.dyndbg" (with "t"), I think.
> > Right, thanks for the correction.
> 
> Hi Mika, Bjorn,
> 
> thanks for picking this up. I have used pcieportdrv.dyndbg=+p (not sure
> if something else would have worked) and was lucky that I could transfer
> dmesg to a USB stick before the machine went completely unusable.
> 
> $ grep pcieport dmesg.txt
> 
> [    0.698739] pcieport 0000:00:01.3: Signaling PME with IRQ 28
> [    0.698799] pcieport 0000:00:01.3: AER: enabled with IRQ 28
> [    0.698966] pcieport 0000:00:03.1: Signaling PME with IRQ 29
> [    0.699017] pcieport 0000:00:03.1: AER: enabled with IRQ 29
> [    0.699188] pcieport 0000:00:07.1: Signaling PME with IRQ 30
> [    0.699230] pcieport 0000:00:07.1: AER: enabled with IRQ 30
> [    0.699816] pcieport 0000:00:08.1: Signaling PME with IRQ 31
> [    0.699860] pcieport 0000:00:08.1: AER: enabled with IRQ 31
> [  119.637492] pcieport 0000:00:03.1: waiting downstream link for 100 ms
> [  119.649285] pcieport 0000:00:08.1: waiting downstream link for 100 ms
> [  119.649287] pcieport 0000:00:07.1: waiting downstream link for 100 ms
> [  119.649376] pcieport 0000:00:01.3: waiting downstream link for 100 ms
> [  119.803025] pcieport 0000:16:08.0: waiting downstream link for 100 ms
> [  119.803031] pcieport 0000:16:01.0: waiting downstream link for 100 ms

Can you also attach the dmesg.txt to the bugzilla entry?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: regression: PCIe resume from suspend stalls I/O and causes interrupt storms in Linux 5.3-rc2 (5.2.5, 5.1.20) on Ryzen 7 1700/AMD X370 MSI board since 5817d78eba34f6c86f5462ae2c5212f80a013357, 5.2/5.3 w/ pcieIRQ loop.
  2019-08-05 15:08         ` Mika Westerberg
@ 2019-08-05 15:56           ` Matthias Andree
  0 siblings, 0 replies; 7+ messages in thread
From: Matthias Andree @ 2019-08-05 15:56 UTC (permalink / raw)
  Cc: linux-pci

Am 05.08.19 um 17:08 schrieb Mika Westerberg:
> Can you also attach the dmesg.txt to the bugzilla entry?

I chose to mail it privately.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-08-05 15:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-02 19:03 regression: PCIe resume from suspend stalls I/O and causes interrupt storms in Linux 5.3-rc2 (5.2.5, 5.1.20) on Ryzen 7 1700/AMD X370 MSI board since 5817d78eba34f6c86f5462ae2c5212f80a013357, 5.2/5.3 w/ pcieIRQ loop Matthias Andree
2019-08-05 12:27 ` Mika Westerberg
2019-08-05 12:47   ` Bjorn Helgaas
2019-08-05 13:00     ` Mika Westerberg
2019-08-05 14:01       ` Matthias Andree
2019-08-05 15:08         ` Mika Westerberg
2019-08-05 15:56           ` Matthias Andree

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).