platform-driver-x86.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Bug 216230] New: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled
@ 2022-07-09 18:24 bugzilla-daemon
  2022-07-09 18:26 ` [Bug 216230] " bugzilla-daemon
                   ` (24 more replies)
  0 siblings, 25 replies; 26+ messages in thread
From: bugzilla-daemon @ 2022-07-09 18:24 UTC (permalink / raw)
  To: platform-driver-x86

https://bugzilla.kernel.org/show_bug.cgi?id=216230

            Bug ID: 216230
           Summary: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when
                    s2idle is enabled
           Product: Drivers
           Version: 2.5
    Kernel Version: 5.18.10
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Platform_x86
          Assignee: drivers_platform_x86@kernel-bugs.osdl.org
          Reporter: madcatx@atlas.cz
        Regression: No

Whenever I power up my T14 Gen1 TP with AMD chip, I get the following error
about 4 seconds after boot

[    4.304639] kernel: irq 9: nobody cared (try booting with the "irqpoll"
option)
[    4.304652] kernel: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.18.9-arch1-1
#1 137f0035b2ece06cb65382579db27e9de66af504
[    4.304654] kernel: Hardware name: LENOVO 20UDS02D00/20UDS02D00, BIOS
R1BET71W(1.40 ) 04/05/2022
[    4.304655] kernel: Call Trace:
[    4.304656] kernel:  <IRQ>
[    4.304658] kernel:  dump_stack_lvl+0x48/0x5d
[    4.304662] kernel:  __report_bad_irq+0x35/0xaa
[    4.304664] kernel:  note_interrupt.cold+0xa/0x65
[    4.304666] kernel:  handle_irq_event+0x72/0x80
[    4.304669] kernel:  handle_fasteoi_irq+0x8e/0x1f0
[    4.304671] kernel:  __common_interrupt+0x46/0xa0
[    4.304674] kernel:  common_interrupt+0x81/0xa0
[    4.304677] kernel:  </IRQ>
[    4.304677] kernel:  <TASK>
[    4.304678] kernel:  asm_common_interrupt+0x22/0x40
[    4.304681] kernel: RIP: 0010:cpuidle_enter_state+0xdc/0x380
[    4.304684] kernel: Code: 00 00 31 ff e8 95 70 80 ff 45 84 ff 74 16 9c 58 0f
1f 40 00 f6 c4 02 0f 85 8f 02 00 00 31 ff e8 fa 44 86 ff fb 0f 1f 44 00 00 <45>
85 f6 0f 88 22 01 00 00 49 63 ce 48 8d 04 49 48 8d 04 81 49 8d
[    4.304685] kernel: RSP: 0018:ffffb77f401a7e90 EFLAGS: 00000246
[    4.304687] kernel: RAX: ffff8f606f872cc0 RBX: 0000000000000003 RCX:
0000000000000000
[    4.304688] kernel: RDX: 0000000100930767 RSI: fffffffc2980382f RDI:
0000000000000000
[    4.304688] kernel: RBP: ffff8f5980bcfc00 R08: 0000000000000002 R09:
000000004b6f7f5c
[    4.304689] kernel: R10: 0000000000000d4c R11: 0000000000000002 R12:
ffffffff94d51300
[    4.304690] kernel: R13: 0000000100930767 R14: 0000000000000002 R15:
0000000000000000
[    4.304692] kernel:  cpuidle_enter+0x2d/0x40
[    4.304693] kernel:  do_idle+0x1ba/0x220
[    4.304695] kernel:  cpu_startup_entry+0x1d/0x20
[    4.304696] kernel:  start_secondary+0x11c/0x140
[    4.304699] kernel:  secondary_startup_64_no_verify+0xd5/0xdb
[    4.304702] kernel:  </TASK>

When the machine boots up, I can check /proc/interrupts and get this for IRQ 9

  9:          0     100076          0          0          0          0         
0          0          0          0          0          0          0          0 
        0          0  IR-IO-APIC    9-fasteoi   acpi, pinctrl_amd

Rather interestingly, when I reboot the machine, I don't get any IRQ-related
errors during boot and everything works fine. /proc/interrupts now looks like
this

 9:          0         87          0          0          0          0         
0          0          0          0          0          0          0          0 
        0          0  IR-IO-APIC    9-fasteoi   acpi, pinctrl_amd

The number of interrupts very slowly increases as the computer runs.

This odd IRQ9 error that happens only on the first cold boot happens only when
I enable "Windows sleep", aka s2idle, mode in BIOS. Switching to S3 ("Linux
sleep") takes care of the IRQ problem but I'd rather stick to s2idle because
the latest Lenovo firmware has some issues that go away when s2idle is enabled.

Thanks!

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 216230] "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled
  2022-07-09 18:24 [Bug 216230] New: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled bugzilla-daemon
@ 2022-07-09 18:26 ` bugzilla-daemon
  2022-07-10 15:52 ` bugzilla-daemon
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2022-07-09 18:26 UTC (permalink / raw)
  To: platform-driver-x86

https://bugzilla.kernel.org/show_bug.cgi?id=216230

madcatx@atlas.cz changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Kernel Version|5.18.10                     |5.18.9

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 216230] "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled
  2022-07-09 18:24 [Bug 216230] New: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled bugzilla-daemon
  2022-07-09 18:26 ` [Bug 216230] " bugzilla-daemon
@ 2022-07-10 15:52 ` bugzilla-daemon
  2022-07-11  6:56 ` bugzilla-daemon
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2022-07-10 15:52 UTC (permalink / raw)
  To: platform-driver-x86

https://bugzilla.kernel.org/show_bug.cgi?id=216230

Mario Limonciello (AMD) (mario.limonciello@amd.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO
                 CC|                            |markpearson@lenovo.com
           Hardware|All                         |AMD

--- Comment #1 from Mario Limonciello (AMD) (mario.limonciello@amd.com) ---
I've got a handful of questions that should help clarify what's going on.

1) Is this a regression from a kernel upgrade?  What kernel introduced it?
2) Is it a regression from a BIOS upgrade?  What BIOS didn't have it?  Can you
downgrade to that and it goes away?
3) Does s2idle still work?  What indicates it did or didn't is whether this
message appears on resume:
https://github.com/torvalds/linux/blob/master/drivers/platform/x86/amd-pmc.c#L361

In addition to those questions the logs I would like to see:
* A full dmesg log including a suspend cycle.
* /sys/kernel/debug/gpio from a cold boot (that the problem reproduces)
* /sys/kernel/debug/gpio from a warm boot (that the problem doesn't reproduce)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 216230] "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled
  2022-07-09 18:24 [Bug 216230] New: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled bugzilla-daemon
  2022-07-09 18:26 ` [Bug 216230] " bugzilla-daemon
  2022-07-10 15:52 ` bugzilla-daemon
@ 2022-07-11  6:56 ` bugzilla-daemon
  2022-07-11  6:56 ` bugzilla-daemon
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2022-07-11  6:56 UTC (permalink / raw)
  To: platform-driver-x86

https://bugzilla.kernel.org/show_bug.cgi?id=216230

--- Comment #2 from madcatx@atlas.cz ---
Created attachment 301385
  --> https://bugzilla.kernel.org/attachment.cgi?id=301385&action=edit
/sys/kernel/debug/gpio when the issue does manifest

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 216230] "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled
  2022-07-09 18:24 [Bug 216230] New: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled bugzilla-daemon
                   ` (2 preceding siblings ...)
  2022-07-11  6:56 ` bugzilla-daemon
@ 2022-07-11  6:56 ` bugzilla-daemon
  2022-07-11  6:57 ` bugzilla-daemon
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2022-07-11  6:56 UTC (permalink / raw)
  To: platform-driver-x86

https://bugzilla.kernel.org/show_bug.cgi?id=216230

--- Comment #3 from madcatx@atlas.cz ---
Created attachment 301386
  --> https://bugzilla.kernel.org/attachment.cgi?id=301386&action=edit
/sys/kernel/debug/gpio when there is no issue

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 216230] "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled
  2022-07-09 18:24 [Bug 216230] New: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled bugzilla-daemon
                   ` (3 preceding siblings ...)
  2022-07-11  6:56 ` bugzilla-daemon
@ 2022-07-11  6:57 ` bugzilla-daemon
  2022-07-11  7:02 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2022-07-11  6:57 UTC (permalink / raw)
  To: platform-driver-x86

https://bugzilla.kernel.org/show_bug.cgi?id=216230

--- Comment #4 from madcatx@atlas.cz ---
Created attachment 301387
  --> https://bugzilla.kernel.org/attachment.cgi?id=301387&action=edit
Full dmesg log with suspend cycle

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 216230] "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled
  2022-07-09 18:24 [Bug 216230] New: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled bugzilla-daemon
                   ` (4 preceding siblings ...)
  2022-07-11  6:57 ` bugzilla-daemon
@ 2022-07-11  7:02 ` bugzilla-daemon
  2022-07-11 16:09 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2022-07-11  7:02 UTC (permalink / raw)
  To: platform-driver-x86

https://bugzilla.kernel.org/show_bug.cgi?id=216230

--- Comment #5 from madcatx@atlas.cz ---
Thanks for looking into this.

It turns out that the issue is more difficult to reproduce than I'd originally
thought. I saw a few cold boots that went fine and one warm boot when the irq9
fault happened. It looks like the cold/warm boot was just a red herring.

My machine is affected by the NVMe slow wakeup bug so I only started to use
s2idle since kernel 5.18. As such I don't know if this is a regression. I'll
see if I can downgrade the firmware if you think that'd help you track this
down. FTR I'm on version 1.40 which seems to be the latest one as of now.

I attached the requested logs.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 216230] "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled
  2022-07-09 18:24 [Bug 216230] New: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled bugzilla-daemon
                   ` (5 preceding siblings ...)
  2022-07-11  7:02 ` bugzilla-daemon
@ 2022-07-11 16:09 ` bugzilla-daemon
  2022-07-11 16:30 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2022-07-11 16:09 UTC (permalink / raw)
  To: platform-driver-x86

https://bugzilla.kernel.org/show_bug.cgi?id=216230

--- Comment #6 from Mario Limonciello (AMD) (mario.limonciello@amd.com) ---
The tough thing with this bug is that in this design the IRQ is shared both for
the ACPI SCI as well as for the GPIO controller.  There are patches in the
kernel's s2idle handling created specifically for that case.  But it means when
you're dealing with a spurious interrupt you don't know which device it was
REALLY destined for.

Looking at your GPIO output it seems that pin 32 switches from output pin in
the good scenario to input pin in the bad scenario.  Can you try to correlate
if this happens every single time you see a problem or it's also a red herring?

Good 
> pin32     interrupt is disabled| interrupt is masked| disable wakeup in S0i3
> state| disable wakeup in S3 state|
 disable wakeup in S4/S5 state|     pull-up is disabled| Pull-down is disabled|
output is high| output is enabled| debouncing filter disabled|   0xc50000

Bad
> pin32     interrupt is disabled| interrupt is masked| disable wakeup in S0i3
> state| disable wakeup in S3 state|
 disable wakeup in S4/S5 state| input is high|   pull-up is disabled| Pull-down
is disabled|   output is disabled| debouncing filter disabled|   0x50000

> My machine is affected by the NVMe slow wakeup bug so I only started to use
> s2idle since kernel 5.18.

Sorry what bug?  Are you talking about
https://github.com/torvalds/linux/commit/455cd867b85b53fd3602345f9b8a8facc551adc9
?

> I'll see if I can downgrade the firmware if you think that'd help you track
> this down. FTR I'm on version 1.40 which seems to be the latest one as of
> now.

> I attached the requested logs.

I didn't see the issue listed there, I was hoping to understand the timing of
it.

Something I'm wondering is if maybe it's an IRQ destined for the GPIO
controller but the GPIO controller driver isn't yet loaded?  This could be
particularly relevant if it's not in your initramfs but rather loaded after you
entered your encryption key (noticed root=/dev/mapper/cryptroot)

Could you please turn on dynamic debugging for pinctrl_amd (pinctrl.dyndbg=+p
on your kernel command line )and share a log with it reproducing?

> pcie_aspm=force

FYI - this is generally not a safe thing to do.  If you remove this from your
kernel command line can you reproduce the issue?

> As such I don't know if this is a regression. I'll see if I can downgrade the
> firmware if you think that'd help you track this down. FTR I'm on version
> 1.40 which seems to be the latest one as of now.

I think Mark will have to comment whether it's something they're aware of,
tracking, etc regression wise.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 216230] "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled
  2022-07-09 18:24 [Bug 216230] New: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled bugzilla-daemon
                   ` (6 preceding siblings ...)
  2022-07-11 16:09 ` bugzilla-daemon
@ 2022-07-11 16:30 ` bugzilla-daemon
  2022-07-11 16:36 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2022-07-11 16:30 UTC (permalink / raw)
  To: platform-driver-x86

https://bugzilla.kernel.org/show_bug.cgi?id=216230

--- Comment #7 from madcatx@atlas.cz ---
(In reply to Mario Limonciello (AMD) from comment #6)
> The tough thing with this bug is that in this design the IRQ is shared both
> for the ACPI SCI as well as for the GPIO controller.  There are patches in
> the kernel's s2idle handling created specifically for that case.  But it
> means when you're dealing with a spurious interrupt you don't know which
> device it was REALLY destined for.
> 
> Looking at your GPIO output it seems that pin 32 switches from output pin in
> the good scenario to input pin in the bad scenario.  Can you try to
> correlate if this happens every single time you see a problem or it's also a
> red herring?
> 
> Good 
> > pin32     interrupt is disabled| interrupt is masked| disable wakeup in
> S0i3
> > state| disable wakeup in S3 state|
>  disable wakeup in S4/S5 state|     pull-up is disabled| Pull-down is
> disabled| output is high| output is enabled| debouncing filter disabled|  
> 0xc50000
> 
> Bad
> > pin32     interrupt is disabled| interrupt is masked| disable wakeup in
> S0i3
> > state| disable wakeup in S3 state|
>  disable wakeup in S4/S5 state| input is high|   pull-up is disabled|
> Pull-down is disabled|   output is disabled| debouncing filter disabled|  
> 0x50000

I noticed the difference there. I'll keep an eye on it whenever I run into
this,

> > My machine is affected by the NVMe slow wakeup bug so I only started to use
> > s2idle since kernel 5.18.
> 
> Sorry what bug?  Are you talking about
> https://github.com/torvalds/linux/commit/
> 455cd867b85b53fd3602345f9b8a8facc551adc9 ?

Yes, that one, at least I think this is what made s2idle usable for me.

> > I'll see if I can downgrade the firmware if you think that'd help you track
> > this down. FTR I'm on version 1.40 which seems to be the latest one as of
> > now.
> 
> > I attached the requested logs.
> 
> I didn't see the issue listed there, I was hoping to understand the timing
> of it.

Apologies, I'll grab a fresh dmesg when it happens again. Although I'm afraid
it won't be very interesting.

> 
> Something I'm wondering is if maybe it's an IRQ destined for the GPIO
> controller but the GPIO controller driver isn't yet loaded?  This could be
> particularly relevant if it's not in your initramfs but rather loaded after
> you entered your encryption key (noticed root=/dev/mapper/cryptroot)

Okay. Would adding pinctrl_amd to initramfs be enough to rule out this
possibility?

> 
> Could you please turn on dynamic debugging for pinctrl_amd
> (pinctrl.dyndbg=+p on your kernel command line )and share a log with it
> reproducing?

Will do.

> 
> > pcie_aspm=force
> 
> FYI - this is generally not a safe thing to do.  If you remove this from
> your kernel command line can you reproduce the issue?

I'm aware but I've been using this machine with forced ASPM for about 2 years
with no apparent issues related to that. I'll give it a go.

> > As such I don't know if this is a regression. I'll see if I can downgrade
> the
> > firmware if you think that'd help you track this down. FTR I'm on version
> > 1.40 which seems to be the latest one as of now.
> 
> I think Mark will have to comment whether it's something they're aware of,
> tracking, etc regression wise.

Thanks a lot! I'll see what I can do and let you know...

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 216230] "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled
  2022-07-09 18:24 [Bug 216230] New: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled bugzilla-daemon
                   ` (7 preceding siblings ...)
  2022-07-11 16:30 ` bugzilla-daemon
@ 2022-07-11 16:36 ` bugzilla-daemon
  2022-07-11 17:34 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2022-07-11 16:36 UTC (permalink / raw)
  To: platform-driver-x86

https://bugzilla.kernel.org/show_bug.cgi?id=216230

--- Comment #8 from Mario Limonciello (AMD) (mario.limonciello@amd.com) ---
> Although I'm afraid it won't be very interesting.

It might not be, but once we know the timing relative to pinctrl-amd loading we
can know if it's that.  If it's not, then our next step will be to crank up
ACPI debugging output to see if it correlates with an ACPI event.

> Okay. Would adding pinctrl_amd to initramfs be enough to rule out this
> possibility?

Yeah that or compiling it as built-in.

> I'm aware but I've been using this machine with forced ASPM for about 2 years
> with no apparent issues related to that. I'll give it a go.

I guess  the reason you do it is for improved power consumption.  I think you
should contrast how much you save with a watt meter and what devices forcing
ASPM actually changes.  If If you find that you have improved power consumption
from something it's better to ask Lenovo to change the defaults in their BIOS. 
They're more well equipped to do stress testing and analysis that PCIe is still
performing properly using the proper equipment.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 216230] "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled
  2022-07-09 18:24 [Bug 216230] New: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled bugzilla-daemon
                   ` (8 preceding siblings ...)
  2022-07-11 16:36 ` bugzilla-daemon
@ 2022-07-11 17:34 ` bugzilla-daemon
  2022-07-11 17:34 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2022-07-11 17:34 UTC (permalink / raw)
  To: platform-driver-x86

https://bugzilla.kernel.org/show_bug.cgi?id=216230

--- Comment #9 from madcatx@atlas.cz ---
Luckily I managed to reproduce this very quickly this time.

The state of the GPIO pin32 is now reported as:
pin32       interrupt is disabled| interrupt is masked| disable wakeup in S0i3
state| disable wakeup in S3 state|
 disable wakeup in S4/S5 state|     pull-up is disabled| Pull-down is disabled|
output is high| output is enabled| debouncing filter disabled|   0xc50000

I attached a new dmesg log with the extended pinctrl debugging enabled.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 216230] "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled
  2022-07-09 18:24 [Bug 216230] New: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled bugzilla-daemon
                   ` (9 preceding siblings ...)
  2022-07-11 17:34 ` bugzilla-daemon
@ 2022-07-11 17:34 ` bugzilla-daemon
  2022-07-11 17:40 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2022-07-11 17:34 UTC (permalink / raw)
  To: platform-driver-x86

https://bugzilla.kernel.org/show_bug.cgi?id=216230

--- Comment #10 from madcatx@atlas.cz ---
Created attachment 301392
  --> https://bugzilla.kernel.org/attachment.cgi?id=301392&action=edit
dmesg withpinctrl debugging

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 216230] "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled
  2022-07-09 18:24 [Bug 216230] New: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled bugzilla-daemon
                   ` (10 preceding siblings ...)
  2022-07-11 17:34 ` bugzilla-daemon
@ 2022-07-11 17:40 ` bugzilla-daemon
  2022-07-11 19:10 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2022-07-11 17:40 UTC (permalink / raw)
  To: platform-driver-x86

https://bugzilla.kernel.org/show_bug.cgi?id=216230

--- Comment #11 from Mario Limonciello (AMD) (mario.limonciello@amd.com) ---
I think you turned on the wrong pinctrl debugging.  I'm looking for the timing
of these statements:

drivers/pinctrl/pinctrl-amd.c:1061 [pinctrl_amd]amd_gpio_probe =_ "amd gpio
driver loaded\012"
drivers/pinctrl/pinctrl-amd.c:655 [pinctrl_amd]_amd_gpio_irq_handler =_
"Disabling spurious GPIO IRQ %d\012"
drivers/pinctrl/pinctrl-amd.c:632 [pinctrl_amd]_amd_gpio_irq_handler =_ "Waking
due to GPIO register 0x%p: 0x%x"

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 216230] "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled
  2022-07-09 18:24 [Bug 216230] New: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled bugzilla-daemon
                   ` (11 preceding siblings ...)
  2022-07-11 17:40 ` bugzilla-daemon
@ 2022-07-11 19:10 ` bugzilla-daemon
  2022-07-11 19:14 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2022-07-11 19:10 UTC (permalink / raw)
  To: platform-driver-x86

https://bugzilla.kernel.org/show_bug.cgi?id=216230

--- Comment #12 from madcatx@atlas.cz ---
Is it pinctrl.dyndbg=+p or pinctrl_amd.dyndbg=+p? I used the former to produce
the last dmesg dump, the latter doesn't seem to do anything.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 216230] "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled
  2022-07-09 18:24 [Bug 216230] New: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled bugzilla-daemon
                   ` (12 preceding siblings ...)
  2022-07-11 19:10 ` bugzilla-daemon
@ 2022-07-11 19:14 ` bugzilla-daemon
  2022-07-11 19:37 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2022-07-11 19:14 UTC (permalink / raw)
  To: platform-driver-x86

https://bugzilla.kernel.org/show_bug.cgi?id=216230

--- Comment #13 from Mario Limonciello (AMD) (mario.limonciello@amd.com) ---
It should be the latter.  You can confirm the statements were turned on by
grepping /sys/kernel/debug/dynamic_debug/control.

If it's not doing anything now is the right time to ask - you do have
CONFIG_PINCTRL_AMD set in your kernel config right?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 216230] "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled
  2022-07-09 18:24 [Bug 216230] New: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled bugzilla-daemon
                   ` (13 preceding siblings ...)
  2022-07-11 19:14 ` bugzilla-daemon
@ 2022-07-11 19:37 ` bugzilla-daemon
  2022-07-12  6:52 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2022-07-11 19:37 UTC (permalink / raw)
  To: platform-driver-x86

https://bugzilla.kernel.org/show_bug.cgi?id=216230

--- Comment #14 from Mario Limonciello (AMD) (mario.limonciello@amd.com) ---
Try using this instead:

dyndbg="module pinctrl_amd +p"

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 216230] "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled
  2022-07-09 18:24 [Bug 216230] New: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled bugzilla-daemon
                   ` (14 preceding siblings ...)
  2022-07-11 19:37 ` bugzilla-daemon
@ 2022-07-12  6:52 ` bugzilla-daemon
  2022-07-12 15:34 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2022-07-12  6:52 UTC (permalink / raw)
  To: platform-driver-x86

https://bugzilla.kernel.org/show_bug.cgi?id=216230

--- Comment #15 from madcatx@atlas.cz ---
I made a small discovery today. After adding pinctrl_amd module to my initramfs
I haven't been able to reproduce the problem so far. That may not be indicative
of anything because the problem does not appear every time. A bit more
interesting is my current output of /proc/interrupts. For IRQ9 it currently
shows

 9:          0      11461          0          0          0          0         
0          0          0          0          0          0          0          0 
        0          0  IR-IO-APIC    9-fasteoi   acpi, pinctrl_amd

Notice that the number of recorded interrupts is around 11000. Before I added
pinctrl_amd to initramfs, there were two distinct cases. Either over 100000
interrupts with the kernel complaining about IRQ9 not being serviced or about
90 interrupts and no apparent issues. All of these numbers are from
/proc/interrupts dump done right after logging into DE after a reboot.

I was able to reproduce these figures multiple times. This suggests that some
HW starts to blast IRQs and doesn't stop until it gets serviced. Knowing this,
is there anything else you'd like me to try besides your suggestions above?

My CONFIG_PINCTRL_AMD is set to CONFIG_PINCTRL_AMD=m in my kernel config and
the respective module is loaded.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 216230] "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled
  2022-07-09 18:24 [Bug 216230] New: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled bugzilla-daemon
                   ` (15 preceding siblings ...)
  2022-07-12  6:52 ` bugzilla-daemon
@ 2022-07-12 15:34 ` bugzilla-daemon
  2022-07-13  6:33 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2022-07-12 15:34 UTC (permalink / raw)
  To: platform-driver-x86

https://bugzilla.kernel.org/show_bug.cgi?id=216230

Mario Limonciello (AMD) (mario.limonciello@amd.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |ASSIGNED

--- Comment #16 from Mario Limonciello (AMD) (mario.limonciello@amd.com) ---
> Knowing this, is there anything else you'd like me to try besides your
> suggestions above?

Actually this does make something pop out in my mind.

In your failing log above you had shown this:
> [    4.295526] kernel: Disabling IRQ #9

By this happening before pinctrl_amd is loaded might mean that IRQs are not
being serviced anymore for the ACPI SCI driver.  When pinctrl_amd gets loaded
it tries to service IRQ 9 stuff, but it's not destined for that driver and then
you see an IRQ storm from ACPI interrupts.

So IMO continue to monitor to see if putting it into the initramfs helped the
problem afterall, but please leave the dynamic debugging for pinctrl-amd in
place on your kernel command line in case it comes back and we need to look at
the timing of the error and pinctrl-amd loading.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 216230] "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled
  2022-07-09 18:24 [Bug 216230] New: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled bugzilla-daemon
                   ` (16 preceding siblings ...)
  2022-07-12 15:34 ` bugzilla-daemon
@ 2022-07-13  6:33 ` bugzilla-daemon
  2022-07-13 17:18 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2022-07-13  6:33 UTC (permalink / raw)
  To: platform-driver-x86

https://bugzilla.kernel.org/show_bug.cgi?id=216230

--- Comment #17 from madcatx@atlas.cz ---
My output from /sys/kernel/debug/dynamic_debug/control lists pinctrl_amd like
this

drivers/pinctrl/pinctrl-amd.c:1056 [pinctrl_amd]amd_gpio_probe =p "amd gpio
driver loaded\012"
drivers/pinctrl/pinctrl-amd.c:655 [pinctrl_amd]do_amd_gpio_irq_handler =p
"Disabling spurious GPIO IRQ %d\012"
drivers/pinctrl/pinctrl-amd.c:632 [pinctrl_amd]do_amd_gpio_irq_handler =p
"Waking due to GPIO %d: 0x%x"

IIRC this means that those debugging statements are enabled. The only relevant
line from dmesg is this though:

[    0.668575] amd_gpio AMDI0030:00: amd gpio driver loaded

I just noticed another case where the number of interrupts reached about 12000.
With a bit of careful optimism I think that loading pinctrl_amd early during
boot puts an early stop to the IRQ storm before the kernel intervenes and
disables the interrupt channel.

As for reproducibility, I think that the machine needs to run for some time
before the problem can be "re-reproduced". This is why it looked like it was
happening only on cold boots. Assuming there an EC connected to these GPIOs I
guess it's some kind of timer overflow...?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 216230] "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled
  2022-07-09 18:24 [Bug 216230] New: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled bugzilla-daemon
                   ` (17 preceding siblings ...)
  2022-07-13  6:33 ` bugzilla-daemon
@ 2022-07-13 17:18 ` bugzilla-daemon
  2022-07-13 17:29 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2022-07-13 17:18 UTC (permalink / raw)
  To: platform-driver-x86

https://bugzilla.kernel.org/show_bug.cgi?id=216230

--- Comment #18 from Mario Limonciello (AMD) (mario.limonciello@amd.com) ---
> IIRC this means that those debugging statements are enabled. The only
> relevant line from dmesg is this though:

Yeah the other two messages you won't be seeing generally.  They're relevant to
certain circumstances.

> [    0.668575] amd_gpio AMDI0030:00: amd gpio driver loaded

The really important thing here is the kernel timestamp.  If you compare it to
the failing log you have above, you can see that the failure happens a lot
later in the boot.

[    4.295441] kernel: irq 9: nobody cared (try booting with the "irqpoll"
option)

> I just noticed another case where the number of interrupts reached about
> 12000. With a bit of careful optimism I think that loading pinctrl_amd early
> during boot puts an early stop to the IRQ storm before the kernel intervenes
> and disables the interrupt channel.

> As for reproducibility, I think that the machine needs to run for some time
> before the problem can be "re-reproduced". This is why it looked like it was
> happening only on cold boots. Assuming there an EC connected to these GPIOs I
> guess it's some kind of timer overflow...?

Careful optimism sounds appropriate, but fingers crossed!

As a general statement it seems that by using an encrypted partition you
exacerbated this problem because you will by definition have a longer boot that
you don't have access to pinctrl-amd's kernel object while getting your
passphrase entered.

I'd say if you don't hit this in the next few days with this change we should
close it as "DOCUMENTED".  Mayyybe we should add something to the Kconfig text
to warn about this?  Not sure what else can really be done from the kernel.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 216230] "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled
  2022-07-09 18:24 [Bug 216230] New: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled bugzilla-daemon
                   ` (18 preceding siblings ...)
  2022-07-13 17:18 ` bugzilla-daemon
@ 2022-07-13 17:29 ` bugzilla-daemon
  2022-07-13 18:04 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2022-07-13 17:29 UTC (permalink / raw)
  To: platform-driver-x86

https://bugzilla.kernel.org/show_bug.cgi?id=216230

Hans de Goede (jwrdegoede@fedoraproject.org) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jwrdegoede@fedoraproject.or
                   |                            |g

--- Comment #19 from Hans de Goede (jwrdegoede@fedoraproject.org) ---
> I'd say if you don't hit this in the next few days with this change we should
> close it as "DOCUMENTED".  Mayyybe we should add something to the Kconfig
> text to warn about this?  Not sure what else can really be done from the
> kernel.

We could consider changing the amd pinctrl option into a bool instead of a
tristate, forcing it to be builtin. In general ACPI tables may rely on the
pinctrl through GPIO OpRegions in various ways. So making sure that these are
available early usually is a good idea. I've also seen issues on some Intel
models when the pinctrl driver is not builtin.

Note I just checked and the Fedora kernels already have:

CONFIG_PINCTRL_AMD=y

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 216230] "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled
  2022-07-09 18:24 [Bug 216230] New: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled bugzilla-daemon
                   ` (19 preceding siblings ...)
  2022-07-13 17:29 ` bugzilla-daemon
@ 2022-07-13 18:04 ` bugzilla-daemon
  2022-07-14 11:31 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2022-07-13 18:04 UTC (permalink / raw)
  To: platform-driver-x86

https://bugzilla.kernel.org/show_bug.cgi?id=216230

--- Comment #20 from Mario Limonciello (AMD) (mario.limonciello@amd.com) ---
Great suggestion, thanks!
I've sent up
https://lore.kernel.org/linux-gpio/20220713175950.964-1-mario.limonciello@amd.com/T/#u
for just that.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 216230] "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled
  2022-07-09 18:24 [Bug 216230] New: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled bugzilla-daemon
                   ` (20 preceding siblings ...)
  2022-07-13 18:04 ` bugzilla-daemon
@ 2022-07-14 11:31 ` bugzilla-daemon
  2022-07-14 11:34 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2022-07-14 11:31 UTC (permalink / raw)
  To: platform-driver-x86

https://bugzilla.kernel.org/show_bug.cgi?id=216230

--- Comment #21 from madcatx@atlas.cz ---
Just FTR, I just reproduced another case where the problem most likely would
have occurred - almost 33000 interrupts before the driver took action - but
having pinctrl_amd loaded early seems to be an effective solution.

I guess that Lenovo people may want to look into this, 33000 intrs is sort of
cutting it close to the kernel disabling the interrupt regardless of the driver
being available.

Thanks a lot for all the guidance how to track this down!

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 216230] "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled
  2022-07-09 18:24 [Bug 216230] New: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled bugzilla-daemon
                   ` (21 preceding siblings ...)
  2022-07-14 11:31 ` bugzilla-daemon
@ 2022-07-14 11:34 ` bugzilla-daemon
  2022-07-14 13:15 ` bugzilla-daemon
  2022-07-14 13:48 ` bugzilla-daemon
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2022-07-14 11:34 UTC (permalink / raw)
  To: platform-driver-x86

https://bugzilla.kernel.org/show_bug.cgi?id=216230

--- Comment #22 from Mario Limonciello (AMD) (mario.limonciello@amd.com) ---
Were you still using the initramfs method when 33000 came through?
If so; please switch to built-in.  That should allow the driver to load as
early as possible.

> Thanks a lot for all the guidance how to track this down!

Sure!

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 216230] "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled
  2022-07-09 18:24 [Bug 216230] New: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled bugzilla-daemon
                   ` (22 preceding siblings ...)
  2022-07-14 11:34 ` bugzilla-daemon
@ 2022-07-14 13:15 ` bugzilla-daemon
  2022-07-14 13:48 ` bugzilla-daemon
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2022-07-14 13:15 UTC (permalink / raw)
  To: platform-driver-x86

https://bugzilla.kernel.org/show_bug.cgi?id=216230

--- Comment #23 from madcatx@atlas.cz ---
(In reply to Mario Limonciello (AMD) from comment #22)
> Were you still using the initramfs method when 33000 came through?
> If so; please switch to built-in.  That should allow the driver to load as
> early as possible.

Just tried that and the number of unserviced interrupts dropped down to less
than 8000; that's the lowest value I've seen so far. pinctrl_amd gets loaded
about 0.48 secs into boot, it loaded at about 0.67 sec when it was a module.

Anyway, it looks like we pinned the problem down pretty well. Thanks!

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 216230] "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled
  2022-07-09 18:24 [Bug 216230] New: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled bugzilla-daemon
                   ` (23 preceding siblings ...)
  2022-07-14 13:15 ` bugzilla-daemon
@ 2022-07-14 13:48 ` bugzilla-daemon
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2022-07-14 13:48 UTC (permalink / raw)
  To: platform-driver-x86

https://bugzilla.kernel.org/show_bug.cgi?id=216230

Mario Limonciello (AMD) (mario.limonciello@amd.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |DOCUMENTED

--- Comment #24 from Mario Limonciello (AMD) (mario.limonciello@amd.com) ---
OK thanks.  I'll close it, and we'll work on getting that patch landed to
prevent users from compiling as a module.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-07-14 13:48 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-09 18:24 [Bug 216230] New: "irq9: nobody cared" on Thinkpad T14 Gen1 (AMD) when s2idle is enabled bugzilla-daemon
2022-07-09 18:26 ` [Bug 216230] " bugzilla-daemon
2022-07-10 15:52 ` bugzilla-daemon
2022-07-11  6:56 ` bugzilla-daemon
2022-07-11  6:56 ` bugzilla-daemon
2022-07-11  6:57 ` bugzilla-daemon
2022-07-11  7:02 ` bugzilla-daemon
2022-07-11 16:09 ` bugzilla-daemon
2022-07-11 16:30 ` bugzilla-daemon
2022-07-11 16:36 ` bugzilla-daemon
2022-07-11 17:34 ` bugzilla-daemon
2022-07-11 17:34 ` bugzilla-daemon
2022-07-11 17:40 ` bugzilla-daemon
2022-07-11 19:10 ` bugzilla-daemon
2022-07-11 19:14 ` bugzilla-daemon
2022-07-11 19:37 ` bugzilla-daemon
2022-07-12  6:52 ` bugzilla-daemon
2022-07-12 15:34 ` bugzilla-daemon
2022-07-13  6:33 ` bugzilla-daemon
2022-07-13 17:18 ` bugzilla-daemon
2022-07-13 17:29 ` bugzilla-daemon
2022-07-13 18:04 ` bugzilla-daemon
2022-07-14 11:31 ` bugzilla-daemon
2022-07-14 11:34 ` bugzilla-daemon
2022-07-14 13:15 ` bugzilla-daemon
2022-07-14 13:48 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).