linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ARMv8 SError, panic around msi_set_mask_bit during suspend-to-RAM
@ 2022-09-28 19:01 Kevin Rowland
  2022-09-28 21:56 ` Bjorn Helgaas
  2022-09-29 10:07 ` Manivannan Sadhasivam
  0 siblings, 2 replies; 7+ messages in thread
From: Kevin Rowland @ 2022-09-28 19:01 UTC (permalink / raw)
  To: linux-pci

Hello all.

I'm running a custom kernel based on 5.10.72 on an NXP i.MX8QM (an
ARMv8-A system). The i.MX8QM has 2 designware PCIe 3.0 controllers
on-chip. The driver in use is `drivers/pci/controller/dwc/pci-imx6.c`.

By default the PCIe controller's Link Capabilities are set to support
ASPM L0s only, although the datasheet indicates support for both L0s
and L1. I recently modified the host controller driver to set LnkCap
to indicate ASPM L1 and L0s; one controller is attached to an SSD that
indicates support for L1s, so changing the host controller LnkCap
allowed the `aspm_enabled` bitmask to become non-zero and the ASPM
bits to be set in the LnkCtrl registers on the host and the SSD. This
is the good news.

The bad news is that after the LnkCap change, when I attempt to
suspend-to-RAM, I get an asynchronous exception (an SError to be
specific) at the CPU, resulting in a kernel panic. Here's the tail end
of the pstore log that was captured from the exception and resulting
panic:
```
<6>[   90.850564] psci: CPU3 killed (polled 4 ms)
<4>[   90.851989] IRQ 75: no longer affine to CPU4
<4>[   90.852062] IRQ384: set affinity failed(-22).
<4>[   90.852065] IRQ386: set affinity failed(-22).
<5>[   90.852112] CPU4: shutdown
<6>[   90.853135] psci: CPU4 killed (polled 0 ms)
<4>[   90.856402] IRQ384: set affinity failed(-22).
<4>[   90.856409] Eff. affinity 3-5 of IRQ 386 contains only offline
CPUs after offlining CPU 5
<2>[   90.856536] SError Interrupt on CPU5, code 0xbf000002 -- SError
<4>[   90.856538] CPU: 5 PID: 37 Comm: migration/5 Tainted: G
C O      5.10.72-lts-5.10.y+g886272a218dd #1
<4>[   90.856539] Hardware name: Freescale i.MX8QM, Rivian TCM Board X1 (DT)
<4>[   90.856541] pstate: 00000085 (nzcv daIf -PAN -UAO -TCO BTYPE=--)
<4>[   90.856542] pc : msi_set_mask_bit.isra.0+0x40/0x90
<4>[   90.856544] lr : msi_set_mask_bit.isra.0+0x34/0x90
<4>[   90.856545] sp : ffff800012743bf0
<4>[   90.856546] x29: ffff800012743bf0 x28: 0000000000000005
<4>[   90.856550] x27: ffff000806260218 x26: ffff8000121c9260
<4>[   90.856553] x25: ffff800011d1b000 x24: ffff8000121c9480
<4>[   90.856556] x23: ffff80001225a768 x22: ffff000806260260
<4>[   90.856559] x21: ffff0008062602dc x20: ffff000806260200
<4>[   90.856562] x19: ffff00080286b200 x18: 0000000000000020
<4>[   90.856565] x17: 0000000000000001 x16: 0000000000000019
<4>[   90.856567] x15: ffff0008023812f8 x14: 6e696e696c66666f
<4>[   90.856570] x13: 2072657466612073 x12: 55504320656e696c
<4>[   90.856572] x11: 66666f20796c6e6f x10: 20736e6961746e6f
<4>[   90.856575] x9 : ffff8000106aa084 x8 : 4920666f20352d33
<4>[   90.856578] x7 : 207974696e696666 x6 : 0000000000000000
<4>[   90.856581] x5 : 0000000000000000 x4 : 0000000000000000
<4>[   90.856584] x3 : 0000000000000001 x2 : ffff800012d6302c
<4>[   90.856586] x1 : 0000000000000001 x0 : 0000000000000000
<0>[   90.856589] Kernel panic - not syncing: Asynchronous SError Interrupt
<4>[   90.856591] CPU: 5 PID: 37 Comm: migration/5 Tainted: G
C O      5.10.72-lts-5.10.y+g886272a218dd #1
<4>[   90.856592] Hardware name: Freescale i.MX8QM
<4>[   90.856594] Call trace:
<4>[   90.856595]  dump_backtrace+0x0/0x1b0
<4>[   90.856596]  show_stack+0x24/0x30
<4>[   90.856597]  dump_stack+0xd0/0x12c
<4>[   90.856598]  panic+0x178/0x380
<4>[   90.856600]  nmi_panic+0x98/0xa0
<4>[   90.856601]  arm64_serror_panic+0x8c/0x98
<4>[   90.856602]  do_serror+0x64/0x6c
<4>[   90.856603]  el1_error+0x90/0x110
<4>[   90.856604]  msi_set_mask_bit.isra.0+0x40/0x90
<4>[   90.856605]  pci_msi_mask_irq+0x2c/0x40
<4>[   90.856606]  dw_msi_mask_irq+0x24/0x40
<4>[   90.856607]  irq_shutdown+0xa4/0xe0
<4>[   90.856609]  irq_shutdown_and_deactivate+0x24/0x3c
<4>[   90.856610]  irq_migrate_all_off_this_cpu+0x260/0x290
<4>[   90.856611]  __cpu_disable+0xd8/0xf0
<4>[   90.856612]  take_cpu_down+0x48/0xf0
<4>[   90.856613]  multi_cpu_stop+0xb4/0x1a0
<4>[   90.856614]  cpu_stopper_thread+0xa0/0x130
<4>[   90.856615]  smpboot_thread_fn+0x25c/0x290
<4>[   90.856616]  kthread+0x164/0x16c
<4>[   90.856617]  ret_from_fork+0x10/0x30
<2>[   91.856635] SMP: stopping secondary CPUs
<4>[   91.856636] SMP: failed to stop secondary CPUs 0
<0>[   91.856637] Kernel Offset: disabled
<0>[   91.856639] CPU features: 0x0240022,2100600c
<0>[   91.856640] Memory Limit: none
```

Note that the exception occurs right around when we're masking MSIs,
which happens because CPU5 goes offline and the kernel recognizes that
there are no more CPUs left to handle those interrupts. To be a little
more specific, the PC indicates that the CPU was executing a data
memory barrier when the exception arrived. Source here [1] and
disassembly below with my annotation showing the PC when the exception
hit:

```
       1b64:       b9005260        str     w0, [x19, #80]
           asm volatile(ALTERNATIVE("ldr %w0, [%1]",
       1b68:       f9403260        ldr     x0, [x19, #96]
       1b6c:       b9400000        ldr     w0, [x0]
                   readl(desc->mask_base);         /* Flush write to
device */
       1b70:       d5033dbf        dmb     ld      <--- PC during exception
       1b74:       2a0003e0        mov     w0, w0
       1b78:       ca000000        eor     x0, x0, x0
```

At the point of the exception I believe that the PCIe controller is
powered down, although I haven't confirmed.

- - -

I'm trying to understand what's going wrong but I've hit a wall. I
thought the act of writing to the MSI_MASK bit while the HC is powered
down is what caused the issue, but I hacked a fix to avoid calling
`msi_set_mask_bit()` during suspend-to-RAM and still got the
exception.

At this point I'm wondering why we mask MSIs so late in the suspend
process (right when the last non-boot CPU is taken offline). Shouldn't
we disable/mask these IRQs as part of host controller suspend?

I'm also wondering if maybe the PCIe _device_ - the SSD - is writing
into host controller memory after the HC is powered down, which could
cause an exception on the data bus that would obviously be
asynchronous to the CPU. But I've just started learning about PCIe and
I have a very fuzzy understanding of how data flows during an MSI.

Any advice would be welcome.

Thanks,
Kevin

[1] https://source.codeaurora.org/external/imx/linux-imx/tree/drivers/pci/msi.c?h=lf-5.10.y#n243

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ARMv8 SError, panic around msi_set_mask_bit during suspend-to-RAM
  2022-09-28 19:01 ARMv8 SError, panic around msi_set_mask_bit during suspend-to-RAM Kevin Rowland
@ 2022-09-28 21:56 ` Bjorn Helgaas
  2022-09-29  9:08   ` Marc Zyngier
  2022-09-29 10:07 ` Manivannan Sadhasivam
  1 sibling, 1 reply; 7+ messages in thread
From: Bjorn Helgaas @ 2022-09-28 21:56 UTC (permalink / raw)
  To: Kevin Rowland
  Cc: linux-pci, Marc Zyngier, Thomas Gleixner, Manivannan Sadhasivam,
	Krishna Chaitanya Chundru

[+cc Marc, Thomas, Manivannan, Krishna because there's recent
discussion about changing MSI IRQ affinity late in shutdown:
https://lore.kernel.org/linux-pci/?q=b%3Amsi+b%3Aaffinity+]

On Wed, Sep 28, 2022 at 12:01:55PM -0700, Kevin Rowland wrote:
> Hello all.
> 
> I'm running a custom kernel based on 5.10.72 on an NXP i.MX8QM (an
> ARMv8-A system). The i.MX8QM has 2 designware PCIe 3.0 controllers
> on-chip. The driver in use is `drivers/pci/controller/dwc/pci-imx6.c`.
> 
> By default the PCIe controller's Link Capabilities are set to support
> ASPM L0s only, although the datasheet indicates support for both L0s
> and L1. I recently modified the host controller driver to set LnkCap
> to indicate ASPM L1 and L0s; one controller is attached to an SSD that
> indicates support for L1s, so changing the host controller LnkCap
> allowed the `aspm_enabled` bitmask to become non-zero and the ASPM
> bits to be set in the LnkCtrl registers on the host and the SSD. This
> is the good news.
> 
> The bad news is that after the LnkCap change, when I attempt to
> suspend-to-RAM, I get an asynchronous exception (an SError to be
> specific) at the CPU, resulting in a kernel panic. Here's the tail end
> of the pstore log that was captured from the exception and resulting
> panic:
> ```
> <6>[   90.850564] psci: CPU3 killed (polled 4 ms)
> <4>[   90.851989] IRQ 75: no longer affine to CPU4
> <4>[   90.852062] IRQ384: set affinity failed(-22).
> <4>[   90.852065] IRQ386: set affinity failed(-22).
> <5>[   90.852112] CPU4: shutdown
> <6>[   90.853135] psci: CPU4 killed (polled 0 ms)
> <4>[   90.856402] IRQ384: set affinity failed(-22).
> <4>[   90.856409] Eff. affinity 3-5 of IRQ 386 contains only offline
> CPUs after offlining CPU 5
> <2>[   90.856536] SError Interrupt on CPU5, code 0xbf000002 -- SError
> <4>[   90.856538] CPU: 5 PID: 37 Comm: migration/5 Tainted: G
> C O      5.10.72-lts-5.10.y+g886272a218dd #1
> <4>[   90.856539] Hardware name: Freescale i.MX8QM, Rivian TCM Board X1 (DT)
> <4>[   90.856541] pstate: 00000085 (nzcv daIf -PAN -UAO -TCO BTYPE=--)
> <4>[   90.856542] pc : msi_set_mask_bit.isra.0+0x40/0x90
> <4>[   90.856544] lr : msi_set_mask_bit.isra.0+0x34/0x90
> <4>[   90.856545] sp : ffff800012743bf0
> <4>[   90.856546] x29: ffff800012743bf0 x28: 0000000000000005
> <4>[   90.856550] x27: ffff000806260218 x26: ffff8000121c9260
> <4>[   90.856553] x25: ffff800011d1b000 x24: ffff8000121c9480
> <4>[   90.856556] x23: ffff80001225a768 x22: ffff000806260260
> <4>[   90.856559] x21: ffff0008062602dc x20: ffff000806260200
> <4>[   90.856562] x19: ffff00080286b200 x18: 0000000000000020
> <4>[   90.856565] x17: 0000000000000001 x16: 0000000000000019
> <4>[   90.856567] x15: ffff0008023812f8 x14: 6e696e696c66666f
> <4>[   90.856570] x13: 2072657466612073 x12: 55504320656e696c
> <4>[   90.856572] x11: 66666f20796c6e6f x10: 20736e6961746e6f
> <4>[   90.856575] x9 : ffff8000106aa084 x8 : 4920666f20352d33
> <4>[   90.856578] x7 : 207974696e696666 x6 : 0000000000000000
> <4>[   90.856581] x5 : 0000000000000000 x4 : 0000000000000000
> <4>[   90.856584] x3 : 0000000000000001 x2 : ffff800012d6302c
> <4>[   90.856586] x1 : 0000000000000001 x0 : 0000000000000000
> <0>[   90.856589] Kernel panic - not syncing: Asynchronous SError Interrupt
> <4>[   90.856591] CPU: 5 PID: 37 Comm: migration/5 Tainted: G
> C O      5.10.72-lts-5.10.y+g886272a218dd #1
> <4>[   90.856592] Hardware name: Freescale i.MX8QM
> <4>[   90.856594] Call trace:
> <4>[   90.856595]  dump_backtrace+0x0/0x1b0
> <4>[   90.856596]  show_stack+0x24/0x30
> <4>[   90.856597]  dump_stack+0xd0/0x12c
> <4>[   90.856598]  panic+0x178/0x380
> <4>[   90.856600]  nmi_panic+0x98/0xa0
> <4>[   90.856601]  arm64_serror_panic+0x8c/0x98
> <4>[   90.856602]  do_serror+0x64/0x6c
> <4>[   90.856603]  el1_error+0x90/0x110
> <4>[   90.856604]  msi_set_mask_bit.isra.0+0x40/0x90
> <4>[   90.856605]  pci_msi_mask_irq+0x2c/0x40
> <4>[   90.856606]  dw_msi_mask_irq+0x24/0x40
> <4>[   90.856607]  irq_shutdown+0xa4/0xe0
> <4>[   90.856609]  irq_shutdown_and_deactivate+0x24/0x3c
> <4>[   90.856610]  irq_migrate_all_off_this_cpu+0x260/0x290
> <4>[   90.856611]  __cpu_disable+0xd8/0xf0
> <4>[   90.856612]  take_cpu_down+0x48/0xf0
> <4>[   90.856613]  multi_cpu_stop+0xb4/0x1a0
> <4>[   90.856614]  cpu_stopper_thread+0xa0/0x130
> <4>[   90.856615]  smpboot_thread_fn+0x25c/0x290
> <4>[   90.856616]  kthread+0x164/0x16c
> <4>[   90.856617]  ret_from_fork+0x10/0x30
> <2>[   91.856635] SMP: stopping secondary CPUs
> <4>[   91.856636] SMP: failed to stop secondary CPUs 0
> <0>[   91.856637] Kernel Offset: disabled
> <0>[   91.856639] CPU features: 0x0240022,2100600c
> <0>[   91.856640] Memory Limit: none
> ```
> 
> Note that the exception occurs right around when we're masking MSIs,
> which happens because CPU5 goes offline and the kernel recognizes that
> there are no more CPUs left to handle those interrupts. To be a little
> more specific, the PC indicates that the CPU was executing a data
> memory barrier when the exception arrived. Source here [1] and
> disassembly below with my annotation showing the PC when the exception
> hit:
> 
> ```
>        1b64:       b9005260        str     w0, [x19, #80]
>            asm volatile(ALTERNATIVE("ldr %w0, [%1]",
>        1b68:       f9403260        ldr     x0, [x19, #96]
>        1b6c:       b9400000        ldr     w0, [x0]
>                    readl(desc->mask_base);         /* Flush write to
> device */
>        1b70:       d5033dbf        dmb     ld      <--- PC during exception
>        1b74:       2a0003e0        mov     w0, w0
>        1b78:       ca000000        eor     x0, x0, x0
> ```
> 
> At the point of the exception I believe that the PCIe controller is
> powered down, although I haven't confirmed.
> 
> - - -
> 
> I'm trying to understand what's going wrong but I've hit a wall. I
> thought the act of writing to the MSI_MASK bit while the HC is powered
> down is what caused the issue, but I hacked a fix to avoid calling
> `msi_set_mask_bit()` during suspend-to-RAM and still got the
> exception.
> 
> At this point I'm wondering why we mask MSIs so late in the suspend
> process (right when the last non-boot CPU is taken offline). Shouldn't
> we disable/mask these IRQs as part of host controller suspend?
> 
> I'm also wondering if maybe the PCIe _device_ - the SSD - is writing
> into host controller memory after the HC is powered down, which could
> cause an exception on the data bus that would obviously be
> asynchronous to the CPU. But I've just started learning about PCIe and
> I have a very fuzzy understanding of how data flows during an MSI.
> 
> Any advice would be welcome.
> 
> Thanks,
> Kevin
> 
> [1] https://source.codeaurora.org/external/imx/linux-imx/tree/drivers/pci/msi.c?h=lf-5.10.y#n243

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ARMv8 SError, panic around msi_set_mask_bit during suspend-to-RAM
  2022-09-28 21:56 ` Bjorn Helgaas
@ 2022-09-29  9:08   ` Marc Zyngier
  2022-09-29 20:07     ` Kevin Rowland
  0 siblings, 1 reply; 7+ messages in thread
From: Marc Zyngier @ 2022-09-29  9:08 UTC (permalink / raw)
  To: Bjorn Helgaas, Kevin Rowland
  Cc: linux-pci, Thomas Gleixner, Manivannan Sadhasivam,
	Krishna Chaitanya Chundru

On Wed, 28 Sep 2022 17:56:01 -0400,
Bjorn Helgaas <helgaas@kernel.org> wrote:
> 
> [+cc Marc, Thomas, Manivannan, Krishna because there's recent
> discussion about changing MSI IRQ affinity late in shutdown:
> https://lore.kernel.org/linux-pci/?q=b%3Amsi+b%3Aaffinity+]
> 
> On Wed, Sep 28, 2022 at 12:01:55PM -0700, Kevin Rowland wrote:
> > Hello all.
> > 
> > I'm running a custom kernel based on 5.10.72 on an NXP i.MX8QM (an
> > ARMv8-A system). The i.MX8QM has 2 designware PCIe 3.0 controllers
> > on-chip. The driver in use is `drivers/pci/controller/dwc/pci-imx6.c`.
> > 
> > By default the PCIe controller's Link Capabilities are set to support
> > ASPM L0s only, although the datasheet indicates support for both L0s
> > and L1. I recently modified the host controller driver to set LnkCap
> > to indicate ASPM L1 and L0s; one controller is attached to an SSD that
> > indicates support for L1s, so changing the host controller LnkCap
> > allowed the `aspm_enabled` bitmask to become non-zero and the ASPM
> > bits to be set in the LnkCtrl registers on the host and the SSD. This
> > is the good news.
> > 
> > The bad news is that after the LnkCap change, when I attempt to
> > suspend-to-RAM, I get an asynchronous exception (an SError to be
> > specific) at the CPU, resulting in a kernel panic. Here's the tail end
> > of the pstore log that was captured from the exception and resulting
> > panic:
> > ```
> > <6>[   90.850564] psci: CPU3 killed (polled 4 ms)
> > <4>[   90.851989] IRQ 75: no longer affine to CPU4
> > <4>[   90.852062] IRQ384: set affinity failed(-22).
> > <4>[   90.852065] IRQ386: set affinity failed(-22).
> > <5>[   90.852112] CPU4: shutdown
> > <6>[   90.853135] psci: CPU4 killed (polled 0 ms)
> > <4>[   90.856402] IRQ384: set affinity failed(-22).
> > <4>[   90.856409] Eff. affinity 3-5 of IRQ 386 contains only offline
> > CPUs after offlining CPU 5
> > <2>[   90.856536] SError Interrupt on CPU5, code 0xbf000002 -- SError
> > <4>[   90.856538] CPU: 5 PID: 37 Comm: migration/5 Tainted: G
> > C O      5.10.72-lts-5.10.y+g886272a218dd #1
> > <4>[   90.856539] Hardware name: Freescale i.MX8QM, Rivian TCM Board X1 (DT)
> > <4>[   90.856541] pstate: 00000085 (nzcv daIf -PAN -UAO -TCO BTYPE=--)
> > <4>[   90.856542] pc : msi_set_mask_bit.isra.0+0x40/0x90
> > <4>[   90.856544] lr : msi_set_mask_bit.isra.0+0x34/0x90
> > <4>[   90.856545] sp : ffff800012743bf0
> > <4>[   90.856546] x29: ffff800012743bf0 x28: 0000000000000005
> > <4>[   90.856550] x27: ffff000806260218 x26: ffff8000121c9260
> > <4>[   90.856553] x25: ffff800011d1b000 x24: ffff8000121c9480
> > <4>[   90.856556] x23: ffff80001225a768 x22: ffff000806260260
> > <4>[   90.856559] x21: ffff0008062602dc x20: ffff000806260200
> > <4>[   90.856562] x19: ffff00080286b200 x18: 0000000000000020
> > <4>[   90.856565] x17: 0000000000000001 x16: 0000000000000019
> > <4>[   90.856567] x15: ffff0008023812f8 x14: 6e696e696c66666f
> > <4>[   90.856570] x13: 2072657466612073 x12: 55504320656e696c
> > <4>[   90.856572] x11: 66666f20796c6e6f x10: 20736e6961746e6f
> > <4>[   90.856575] x9 : ffff8000106aa084 x8 : 4920666f20352d33
> > <4>[   90.856578] x7 : 207974696e696666 x6 : 0000000000000000
> > <4>[   90.856581] x5 : 0000000000000000 x4 : 0000000000000000
> > <4>[   90.856584] x3 : 0000000000000001 x2 : ffff800012d6302c
> > <4>[   90.856586] x1 : 0000000000000001 x0 : 0000000000000000
> > <0>[   90.856589] Kernel panic - not syncing: Asynchronous SError Interrupt
> > <4>[   90.856591] CPU: 5 PID: 37 Comm: migration/5 Tainted: G
> > C O      5.10.72-lts-5.10.y+g886272a218dd #1
> > <4>[   90.856592] Hardware name: Freescale i.MX8QM
> > <4>[   90.856594] Call trace:
> > <4>[   90.856595]  dump_backtrace+0x0/0x1b0
> > <4>[   90.856596]  show_stack+0x24/0x30
> > <4>[   90.856597]  dump_stack+0xd0/0x12c
> > <4>[   90.856598]  panic+0x178/0x380
> > <4>[   90.856600]  nmi_panic+0x98/0xa0
> > <4>[   90.856601]  arm64_serror_panic+0x8c/0x98
> > <4>[   90.856602]  do_serror+0x64/0x6c
> > <4>[   90.856603]  el1_error+0x90/0x110
> > <4>[   90.856604]  msi_set_mask_bit.isra.0+0x40/0x90
> > <4>[   90.856605]  pci_msi_mask_irq+0x2c/0x40
> > <4>[   90.856606]  dw_msi_mask_irq+0x24/0x40
> > <4>[   90.856607]  irq_shutdown+0xa4/0xe0
> > <4>[   90.856609]  irq_shutdown_and_deactivate+0x24/0x3c
> > <4>[   90.856610]  irq_migrate_all_off_this_cpu+0x260/0x290
> > <4>[   90.856611]  __cpu_disable+0xd8/0xf0
> > <4>[   90.856612]  take_cpu_down+0x48/0xf0
> > <4>[   90.856613]  multi_cpu_stop+0xb4/0x1a0
> > <4>[   90.856614]  cpu_stopper_thread+0xa0/0x130
> > <4>[   90.856615]  smpboot_thread_fn+0x25c/0x290
> > <4>[   90.856616]  kthread+0x164/0x16c
> > <4>[   90.856617]  ret_from_fork+0x10/0x30
> > <2>[   91.856635] SMP: stopping secondary CPUs
> > <4>[   91.856636] SMP: failed to stop secondary CPUs 0
> > <0>[   91.856637] Kernel Offset: disabled
> > <0>[   91.856639] CPU features: 0x0240022,2100600c
> > <0>[   91.856640] Memory Limit: none
> > ```
> > 
> > Note that the exception occurs right around when we're masking MSIs,
> > which happens because CPU5 goes offline and the kernel recognizes that
> > there are no more CPUs left to handle those interrupts. To be a little
> > more specific, the PC indicates that the CPU was executing a data
> > memory barrier when the exception arrived. Source here [1] and
> > disassembly below with my annotation showing the PC when the exception
> > hit:
> > 
> > ```
> >        1b64:       b9005260        str     w0, [x19, #80]
> >            asm volatile(ALTERNATIVE("ldr %w0, [%1]",
> >        1b68:       f9403260        ldr     x0, [x19, #96]
> >        1b6c:       b9400000        ldr     w0, [x0]
> >                    readl(desc->mask_base);         /* Flush write to
> > device */
> >        1b70:       d5033dbf        dmb     ld      <--- PC during exception
> >        1b74:       2a0003e0        mov     w0, w0
> >        1b78:       ca000000        eor     x0, x0, x0
> > ```
> > 
> > At the point of the exception I believe that the PCIe controller is
> > powered down, although I haven't confirmed.

You can't rely on the PC reported by the exception here, as a SError
is asynchronous. This dmb is a red herring: it has no effect on the
target of the load, it just orders subsequent loads after it (it'd be
different if this was a dsb). So the problem is most probably
elsewhere.

A read may be a reasonable indication (assuming the SError is caused
because the read times-out), but it could also have been caused by
something else *earlier* in the instruction stream. Or be generated by
something else altogether, outside of the CPU.

> > 
> > - - -
> > 
> > I'm trying to understand what's going wrong but I've hit a wall. I
> > thought the act of writing to the MSI_MASK bit while the HC is powered
> > down is what caused the issue, but I hacked a fix to avoid calling
> > `msi_set_mask_bit()` during suspend-to-RAM and still got the
> > exception.

Right, here you go. So either this is caused earlier in the suspend
process, or by something on the device side. It would help if you
could bisect this.

> > At this point I'm wondering why we mask MSIs so late in the suspend
> > process (right when the last non-boot CPU is taken offline). Shouldn't
> > we disable/mask these IRQs as part of host controller suspend?

This is what we do for *all* interrupts. The question is really
whether there is any form of PM involved when accessing the device
implementing the masking.

Another thing that I have wondered about for a long while is that
there is a pattern that has been cargo-culted all over the tree
(including by me), where masking an MSI is done both in the interrupt
controller and in the endpoint. For example:

static void dw_msi_mask_irq(struct irq_data *d)
{
	pci_msi_mask_irq(d);
	irq_chip_mask_parent(d);
}

static void dw_msi_unmask_irq(struct irq_data *d)
{
	pci_msi_unmask_irq(d);
	irq_chip_unmask_parent(d);
}

Normally, we should be able to limit only one of these should be
called (because masking both in the irqchip and in the PCI device is
redundant, and there is no guarantee that a non-MSI-X device
implements masking anyway), and I'd expect that removing
pci_msi_[un]mask_irq() could help here.

> > I'm also wondering if maybe the PCIe _device_ - the SSD - is writing
> > into host controller memory after the HC is powered down, which could
> > cause an exception on the data bus that would obviously be
> > asynchronous to the CPU. But I've just started learning about PCIe and
> > I have a very fuzzy understanding of how data flows during an MSI.

If the RC is down, I'm not sure how it could propagate errors back to
the CPU. But hey, I've seen more bizarre things...

HTH,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ARMv8 SError, panic around msi_set_mask_bit during suspend-to-RAM
  2022-09-28 19:01 ARMv8 SError, panic around msi_set_mask_bit during suspend-to-RAM Kevin Rowland
  2022-09-28 21:56 ` Bjorn Helgaas
@ 2022-09-29 10:07 ` Manivannan Sadhasivam
  2022-09-29 20:22   ` Kevin Rowland
  1 sibling, 1 reply; 7+ messages in thread
From: Manivannan Sadhasivam @ 2022-09-29 10:07 UTC (permalink / raw)
  To: Kevin Rowland; +Cc: linux-pci

On Wed, Sep 28, 2022 at 12:01:55PM -0700, Kevin Rowland wrote:
> Hello all.
> 
> I'm running a custom kernel based on 5.10.72 on an NXP i.MX8QM (an
> ARMv8-A system). The i.MX8QM has 2 designware PCIe 3.0 controllers
> on-chip. The driver in use is `drivers/pci/controller/dwc/pci-imx6.c`.
> 
> By default the PCIe controller's Link Capabilities are set to support
> ASPM L0s only, although the datasheet indicates support for both L0s
> and L1. I recently modified the host controller driver to set LnkCap
> to indicate ASPM L1 and L0s; one controller is attached to an SSD that
> indicates support for L1s, so changing the host controller LnkCap
> allowed the `aspm_enabled` bitmask to become non-zero and the ASPM
> bits to be set in the LnkCtrl registers on the host and the SSD. This
> is the good news.
> 
> The bad news is that after the LnkCap change, when I attempt to
> suspend-to-RAM, I get an asynchronous exception (an SError to be
> specific) at the CPU, resulting in a kernel panic. Here's the tail end
> of the pstore log that was captured from the exception and resulting
> panic:
> ```
> <6>[   90.850564] psci: CPU3 killed (polled 4 ms)
> <4>[   90.851989] IRQ 75: no longer affine to CPU4
> <4>[   90.852062] IRQ384: set affinity failed(-22).
> <4>[   90.852065] IRQ386: set affinity failed(-22).
> <5>[   90.852112] CPU4: shutdown
> <6>[   90.853135] psci: CPU4 killed (polled 0 ms)
> <4>[   90.856402] IRQ384: set affinity failed(-22).
> <4>[   90.856409] Eff. affinity 3-5 of IRQ 386 contains only offline
> CPUs after offlining CPU 5
> <2>[   90.856536] SError Interrupt on CPU5, code 0xbf000002 -- SError
> <4>[   90.856538] CPU: 5 PID: 37 Comm: migration/5 Tainted: G
> C O      5.10.72-lts-5.10.y+g886272a218dd #1
> <4>[   90.856539] Hardware name: Freescale i.MX8QM, Rivian TCM Board X1 (DT)
> <4>[   90.856541] pstate: 00000085 (nzcv daIf -PAN -UAO -TCO BTYPE=--)
> <4>[   90.856542] pc : msi_set_mask_bit.isra.0+0x40/0x90
> <4>[   90.856544] lr : msi_set_mask_bit.isra.0+0x34/0x90
> <4>[   90.856545] sp : ffff800012743bf0
> <4>[   90.856546] x29: ffff800012743bf0 x28: 0000000000000005
> <4>[   90.856550] x27: ffff000806260218 x26: ffff8000121c9260
> <4>[   90.856553] x25: ffff800011d1b000 x24: ffff8000121c9480
> <4>[   90.856556] x23: ffff80001225a768 x22: ffff000806260260
> <4>[   90.856559] x21: ffff0008062602dc x20: ffff000806260200
> <4>[   90.856562] x19: ffff00080286b200 x18: 0000000000000020
> <4>[   90.856565] x17: 0000000000000001 x16: 0000000000000019
> <4>[   90.856567] x15: ffff0008023812f8 x14: 6e696e696c66666f
> <4>[   90.856570] x13: 2072657466612073 x12: 55504320656e696c
> <4>[   90.856572] x11: 66666f20796c6e6f x10: 20736e6961746e6f
> <4>[   90.856575] x9 : ffff8000106aa084 x8 : 4920666f20352d33
> <4>[   90.856578] x7 : 207974696e696666 x6 : 0000000000000000
> <4>[   90.856581] x5 : 0000000000000000 x4 : 0000000000000000
> <4>[   90.856584] x3 : 0000000000000001 x2 : ffff800012d6302c
> <4>[   90.856586] x1 : 0000000000000001 x0 : 0000000000000000
> <0>[   90.856589] Kernel panic - not syncing: Asynchronous SError Interrupt
> <4>[   90.856591] CPU: 5 PID: 37 Comm: migration/5 Tainted: G
> C O      5.10.72-lts-5.10.y+g886272a218dd #1
> <4>[   90.856592] Hardware name: Freescale i.MX8QM
> <4>[   90.856594] Call trace:
> <4>[   90.856595]  dump_backtrace+0x0/0x1b0
> <4>[   90.856596]  show_stack+0x24/0x30
> <4>[   90.856597]  dump_stack+0xd0/0x12c
> <4>[   90.856598]  panic+0x178/0x380
> <4>[   90.856600]  nmi_panic+0x98/0xa0
> <4>[   90.856601]  arm64_serror_panic+0x8c/0x98
> <4>[   90.856602]  do_serror+0x64/0x6c
> <4>[   90.856603]  el1_error+0x90/0x110
> <4>[   90.856604]  msi_set_mask_bit.isra.0+0x40/0x90
> <4>[   90.856605]  pci_msi_mask_irq+0x2c/0x40
> <4>[   90.856606]  dw_msi_mask_irq+0x24/0x40
> <4>[   90.856607]  irq_shutdown+0xa4/0xe0
> <4>[   90.856609]  irq_shutdown_and_deactivate+0x24/0x3c
> <4>[   90.856610]  irq_migrate_all_off_this_cpu+0x260/0x290
> <4>[   90.856611]  __cpu_disable+0xd8/0xf0
> <4>[   90.856612]  take_cpu_down+0x48/0xf0
> <4>[   90.856613]  multi_cpu_stop+0xb4/0x1a0
> <4>[   90.856614]  cpu_stopper_thread+0xa0/0x130
> <4>[   90.856615]  smpboot_thread_fn+0x25c/0x290
> <4>[   90.856616]  kthread+0x164/0x16c
> <4>[   90.856617]  ret_from_fork+0x10/0x30
> <2>[   91.856635] SMP: stopping secondary CPUs
> <4>[   91.856636] SMP: failed to stop secondary CPUs 0
> <0>[   91.856637] Kernel Offset: disabled
> <0>[   91.856639] CPU features: 0x0240022,2100600c
> <0>[   91.856640] Memory Limit: none
> ```
> 
> Note that the exception occurs right around when we're masking MSIs,
> which happens because CPU5 goes offline and the kernel recognizes that
> there are no more CPUs left to handle those interrupts. To be a little
> more specific, the PC indicates that the CPU was executing a data
> memory barrier when the exception arrived. Source here [1] and
> disassembly below with my annotation showing the PC when the exception
> hit:
> 
> ```
>        1b64:       b9005260        str     w0, [x19, #80]
>            asm volatile(ALTERNATIVE("ldr %w0, [%1]",
>        1b68:       f9403260        ldr     x0, [x19, #96]
>        1b6c:       b9400000        ldr     w0, [x0]
>                    readl(desc->mask_base);         /* Flush write to
> device */
>        1b70:       d5033dbf        dmb     ld      <--- PC during exception
>        1b74:       2a0003e0        mov     w0, w0
>        1b78:       ca000000        eor     x0, x0, x0
> ```
> 
> At the point of the exception I believe that the PCIe controller is
> powered down, although I haven't confirmed.

Does the pcie-imx6 driver you are using has the PM support? On the Qcom
platforms, we are facing a similar issue where we powerdown the PCIe
controller in the suspend callback of the driver and hitting the SError
while the kernel tries to mask the MSI during offlining of the boot CPU.

For working around this issue, we used syscore_ops() suspend/resume
callbacks in the Qcom PCIe driver.

One might argue that we should not power down the PCIe controller during
suspend, but that's not going to be super efficient in power constraint
devices like mobile phones, laptops...

> 
> - - -
> 
> I'm trying to understand what's going wrong but I've hit a wall. I
> thought the act of writing to the MSI_MASK bit while the HC is powered
> down is what caused the issue, but I hacked a fix to avoid calling
> `msi_set_mask_bit()` during suspend-to-RAM and still got the
> exception.
> 
> At this point I'm wondering why we mask MSIs so late in the suspend
> process (right when the last non-boot CPU is taken offline). Shouldn't
> we disable/mask these IRQs as part of host controller suspend?
> 
> I'm also wondering if maybe the PCIe _device_ - the SSD - is writing
> into host controller memory after the HC is powered down, which could
> cause an exception on the data bus that would obviously be
> asynchronous to the CPU. But I've just started learning about PCIe and
> I have a very fuzzy understanding of how data flows during an MSI.
> 

Well, probability is very low. During testing on our platforms, we
found that the NVMe device was still transmitting messages (don't remember
which one atm) even after suspending it from the NVMe driver.

But that should not cause any SError on the host CPU unless the PCIe
device writes to the unclocked memory. Usually, the PCIe controller/phy
will terminate those messages if the controller is powered down.

Thanks,
Mani

> Any advice would be welcome.
> 
> Thanks,
> Kevin
> 
> [1] https://source.codeaurora.org/external/imx/linux-imx/tree/drivers/pci/msi.c?h=lf-5.10.y#n243

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ARMv8 SError, panic around msi_set_mask_bit during suspend-to-RAM
  2022-09-29  9:08   ` Marc Zyngier
@ 2022-09-29 20:07     ` Kevin Rowland
  2022-09-30 12:53       ` Marc Zyngier
  0 siblings, 1 reply; 7+ messages in thread
From: Kevin Rowland @ 2022-09-29 20:07 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Bjorn Helgaas, linux-pci, Thomas Gleixner, Manivannan Sadhasivam,
	Krishna Chaitanya Chundru

On Thu, Sep 29, 2022 at 2:08 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Wed, 28 Sep 2022 17:56:01 -0400,
> Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > [+cc Marc, Thomas, Manivannan, Krishna because there's recent
> > discussion about changing MSI IRQ affinity late in shutdown:
> > https://lore.kernel.org/linux-pci/?q=b%3Amsi+b%3Aaffinity+]
> >
> > On Wed, Sep 28, 2022 at 12:01:55PM -0700, Kevin Rowland wrote:
> > > Hello all.
> > >
> > > I'm running a custom kernel based on 5.10.72 on an NXP i.MX8QM (an
> > > ARMv8-A system). The i.MX8QM has 2 designware PCIe 3.0 controllers
> > > on-chip. The driver in use is `drivers/pci/controller/dwc/pci-imx6.c`.
> > >
> > > By default the PCIe controller's Link Capabilities are set to support
> > > ASPM L0s only, although the datasheet indicates support for both L0s
> > > and L1. I recently modified the host controller driver to set LnkCap
> > > to indicate ASPM L1 and L0s; one controller is attached to an SSD that
> > > indicates support for L1s, so changing the host controller LnkCap
> > > allowed the `aspm_enabled` bitmask to become non-zero and the ASPM
> > > bits to be set in the LnkCtrl registers on the host and the SSD. This
> > > is the good news.
> > >
> > > The bad news is that after the LnkCap change, when I attempt to
> > > suspend-to-RAM, I get an asynchronous exception (an SError to be
> > > specific) at the CPU, resulting in a kernel panic. Here's the tail end
> > > of the pstore log that was captured from the exception and resulting
> > > panic:
> > > ```
> > > <6>[   90.850564] psci: CPU3 killed (polled 4 ms)
> > > <4>[   90.851989] IRQ 75: no longer affine to CPU4
> > > <4>[   90.852062] IRQ384: set affinity failed(-22).
> > > <4>[   90.852065] IRQ386: set affinity failed(-22).
> > > <5>[   90.852112] CPU4: shutdown
> > > <6>[   90.853135] psci: CPU4 killed (polled 0 ms)
> > > <4>[   90.856402] IRQ384: set affinity failed(-22).
> > > <4>[   90.856409] Eff. affinity 3-5 of IRQ 386 contains only offline
> > > CPUs after offlining CPU 5
> > > <2>[   90.856536] SError Interrupt on CPU5, code 0xbf000002 -- SError
> > > <4>[   90.856538] CPU: 5 PID: 37 Comm: migration/5 Tainted: G
> > > C O      5.10.72-lts-5.10.y+g886272a218dd #1
> > > <4>[   90.856539] Hardware name: Freescale i.MX8QM, Rivian TCM Board X1 (DT)
> > > <4>[   90.856541] pstate: 00000085 (nzcv daIf -PAN -UAO -TCO BTYPE=--)
> > > <4>[   90.856542] pc : msi_set_mask_bit.isra.0+0x40/0x90
> > > <4>[   90.856544] lr : msi_set_mask_bit.isra.0+0x34/0x90
> > > <4>[   90.856545] sp : ffff800012743bf0
> > > <4>[   90.856546] x29: ffff800012743bf0 x28: 0000000000000005
> > > <4>[   90.856550] x27: ffff000806260218 x26: ffff8000121c9260
> > > <4>[   90.856553] x25: ffff800011d1b000 x24: ffff8000121c9480
> > > <4>[   90.856556] x23: ffff80001225a768 x22: ffff000806260260
> > > <4>[   90.856559] x21: ffff0008062602dc x20: ffff000806260200
> > > <4>[   90.856562] x19: ffff00080286b200 x18: 0000000000000020
> > > <4>[   90.856565] x17: 0000000000000001 x16: 0000000000000019
> > > <4>[   90.856567] x15: ffff0008023812f8 x14: 6e696e696c66666f
> > > <4>[   90.856570] x13: 2072657466612073 x12: 55504320656e696c
> > > <4>[   90.856572] x11: 66666f20796c6e6f x10: 20736e6961746e6f
> > > <4>[   90.856575] x9 : ffff8000106aa084 x8 : 4920666f20352d33
> > > <4>[   90.856578] x7 : 207974696e696666 x6 : 0000000000000000
> > > <4>[   90.856581] x5 : 0000000000000000 x4 : 0000000000000000
> > > <4>[   90.856584] x3 : 0000000000000001 x2 : ffff800012d6302c
> > > <4>[   90.856586] x1 : 0000000000000001 x0 : 0000000000000000
> > > <0>[   90.856589] Kernel panic - not syncing: Asynchronous SError Interrupt
> > > <4>[   90.856591] CPU: 5 PID: 37 Comm: migration/5 Tainted: G
> > > C O      5.10.72-lts-5.10.y+g886272a218dd #1
> > > <4>[   90.856592] Hardware name: Freescale i.MX8QM
> > > <4>[   90.856594] Call trace:
> > > <4>[   90.856595]  dump_backtrace+0x0/0x1b0
> > > <4>[   90.856596]  show_stack+0x24/0x30
> > > <4>[   90.856597]  dump_stack+0xd0/0x12c
> > > <4>[   90.856598]  panic+0x178/0x380
> > > <4>[   90.856600]  nmi_panic+0x98/0xa0
> > > <4>[   90.856601]  arm64_serror_panic+0x8c/0x98
> > > <4>[   90.856602]  do_serror+0x64/0x6c
> > > <4>[   90.856603]  el1_error+0x90/0x110
> > > <4>[   90.856604]  msi_set_mask_bit.isra.0+0x40/0x90
> > > <4>[   90.856605]  pci_msi_mask_irq+0x2c/0x40
> > > <4>[   90.856606]  dw_msi_mask_irq+0x24/0x40
> > > <4>[   90.856607]  irq_shutdown+0xa4/0xe0
> > > <4>[   90.856609]  irq_shutdown_and_deactivate+0x24/0x3c
> > > <4>[   90.856610]  irq_migrate_all_off_this_cpu+0x260/0x290
> > > <4>[   90.856611]  __cpu_disable+0xd8/0xf0
> > > <4>[   90.856612]  take_cpu_down+0x48/0xf0
> > > <4>[   90.856613]  multi_cpu_stop+0xb4/0x1a0
> > > <4>[   90.856614]  cpu_stopper_thread+0xa0/0x130
> > > <4>[   90.856615]  smpboot_thread_fn+0x25c/0x290
> > > <4>[   90.856616]  kthread+0x164/0x16c
> > > <4>[   90.856617]  ret_from_fork+0x10/0x30
> > > <2>[   91.856635] SMP: stopping secondary CPUs
> > > <4>[   91.856636] SMP: failed to stop secondary CPUs 0
> > > <0>[   91.856637] Kernel Offset: disabled
> > > <0>[   91.856639] CPU features: 0x0240022,2100600c
> > > <0>[   91.856640] Memory Limit: none
> > > ```
> > >
> > > Note that the exception occurs right around when we're masking MSIs,
> > > which happens because CPU5 goes offline and the kernel recognizes that
> > > there are no more CPUs left to handle those interrupts. To be a little
> > > more specific, the PC indicates that the CPU was executing a data
> > > memory barrier when the exception arrived. Source here [1] and
> > > disassembly below with my annotation showing the PC when the exception
> > > hit:
> > >
> > > ```
> > >        1b64:       b9005260        str     w0, [x19, #80]
> > >            asm volatile(ALTERNATIVE("ldr %w0, [%1]",
> > >        1b68:       f9403260        ldr     x0, [x19, #96]
> > >        1b6c:       b9400000        ldr     w0, [x0]
> > >                    readl(desc->mask_base);         /* Flush write to
> > > device */
> > >        1b70:       d5033dbf        dmb     ld      <--- PC during exception
> > >        1b74:       2a0003e0        mov     w0, w0
> > >        1b78:       ca000000        eor     x0, x0, x0
> > > ```
> > >
> > > At the point of the exception I believe that the PCIe controller is
> > > powered down, although I haven't confirmed.
>
> You can't rely on the PC reported by the exception here, as a SError
> is asynchronous. This dmb is a red herring: it has no effect on the
> target of the load, it just orders subsequent loads after it (it'd be
> different if this was a dsb). So the problem is most probably
> elsewhere.
Understood. My assumption (which I think was naive) was that the
invalid access would have occurred near in time to the exception,
hence my poking at the MSI mask bit write and read-back that occur
just before the DMB.

>
> A read may be a reasonable indication (assuming the SError is caused
> because the read times-out), but it could also have been caused by
> something else *earlier* in the instruction stream. Or be generated by
> something else altogether, outside of the CPU.
>
> > >
> > > - - -
> > >
> > > I'm trying to understand what's going wrong but I've hit a wall. I
> > > thought the act of writing to the MSI_MASK bit while the HC is powered
> > > down is what caused the issue, but I hacked a fix to avoid calling
> > > `msi_set_mask_bit()` during suspend-to-RAM and still got the
> > > exception.
>
> Right, here you go. So either this is caused earlier in the suspend
> process, or by something on the device side. It would help if you
> could bisect this.
>
> > > At this point I'm wondering why we mask MSIs so late in the suspend
> > > process (right when the last non-boot CPU is taken offline). Shouldn't
> > > we disable/mask these IRQs as part of host controller suspend?
>
> This is what we do for *all* interrupts. The question is really
> whether there is any form of PM involved when accessing the device
> implementing the masking.
I'm afraid I didn't understand this part. What is it that we do for
all interrupts? Mask them when the last CPU in their affinity mask is
taken offline, or mask them as part of the suspend process of the
owning device?
>
> Another thing that I have wondered about for a long while is that
> there is a pattern that has been cargo-culted all over the tree
> (including by me), where masking an MSI is done both in the interrupt
> controller and in the endpoint. For example:
>
> static void dw_msi_mask_irq(struct irq_data *d)
> {
>         pci_msi_mask_irq(d);
>         irq_chip_mask_parent(d);
> }
>
> static void dw_msi_unmask_irq(struct irq_data *d)
> {
>         pci_msi_unmask_irq(d);
>         irq_chip_unmask_parent(d);
> }
>
> Normally, we should be able to limit only one of these should be
> called (because masking both in the irqchip and in the PCI device is
> redundant, and there is no guarantee that a non-MSI-X device
> implements masking anyway), and I'd expect that removing
> pci_msi_[un]mask_irq() could help here.
The logic behind this makes sense. If I'm reading the source
correctly, the parent irqchip is `dw_pci_msi_bottom_irq_chip` and its
`.irq_unmask` hook also writes to the RC. I will still try this.
>
> > > I'm also wondering if maybe the PCIe _device_ - the SSD - is writing
> > > into host controller memory after the HC is powered down, which could
> > > cause an exception on the data bus that would obviously be
> > > asynchronous to the CPU. But I've just started learning about PCIe and
> > > I have a very fuzzy understanding of how data flows during an MSI.
>
> If the RC is down, I'm not sure how it could propagate errors back to
> the CPU. But hey, I've seen more bizarre things...
>
> HTH,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ARMv8 SError, panic around msi_set_mask_bit during suspend-to-RAM
  2022-09-29 10:07 ` Manivannan Sadhasivam
@ 2022-09-29 20:22   ` Kevin Rowland
  0 siblings, 0 replies; 7+ messages in thread
From: Kevin Rowland @ 2022-09-29 20:22 UTC (permalink / raw)
  To: Manivannan Sadhasivam; +Cc: linux-pci

On Thu, Sep 29, 2022 at 3:07 AM Manivannan Sadhasivam
<manivannan.sadhasivam@linaro.org> wrote:
>
> On Wed, Sep 28, 2022 at 12:01:55PM -0700, Kevin Rowland wrote:
> > Hello all.
> >
> > I'm running a custom kernel based on 5.10.72 on an NXP i.MX8QM (an
> > ARMv8-A system). The i.MX8QM has 2 designware PCIe 3.0 controllers
> > on-chip. The driver in use is `drivers/pci/controller/dwc/pci-imx6.c`.
> >
> > By default the PCIe controller's Link Capabilities are set to support
> > ASPM L0s only, although the datasheet indicates support for both L0s
> > and L1. I recently modified the host controller driver to set LnkCap
> > to indicate ASPM L1 and L0s; one controller is attached to an SSD that
> > indicates support for L1s, so changing the host controller LnkCap
> > allowed the `aspm_enabled` bitmask to become non-zero and the ASPM
> > bits to be set in the LnkCtrl registers on the host and the SSD. This
> > is the good news.
> >
> > The bad news is that after the LnkCap change, when I attempt to
> > suspend-to-RAM, I get an asynchronous exception (an SError to be
> > specific) at the CPU, resulting in a kernel panic. Here's the tail end
> > of the pstore log that was captured from the exception and resulting
> > panic:
> > ```
> > <6>[   90.850564] psci: CPU3 killed (polled 4 ms)
> > <4>[   90.851989] IRQ 75: no longer affine to CPU4
> > <4>[   90.852062] IRQ384: set affinity failed(-22).
> > <4>[   90.852065] IRQ386: set affinity failed(-22).
> > <5>[   90.852112] CPU4: shutdown
> > <6>[   90.853135] psci: CPU4 killed (polled 0 ms)
> > <4>[   90.856402] IRQ384: set affinity failed(-22).
> > <4>[   90.856409] Eff. affinity 3-5 of IRQ 386 contains only offline
> > CPUs after offlining CPU 5
> > <2>[   90.856536] SError Interrupt on CPU5, code 0xbf000002 -- SError
> > <4>[   90.856538] CPU: 5 PID: 37 Comm: migration/5 Tainted: G
> > C O      5.10.72-lts-5.10.y+g886272a218dd #1
> > <4>[   90.856539] Hardware name: Freescale i.MX8QM, Rivian TCM Board X1 (DT)
> > <4>[   90.856541] pstate: 00000085 (nzcv daIf -PAN -UAO -TCO BTYPE=--)
> > <4>[   90.856542] pc : msi_set_mask_bit.isra.0+0x40/0x90
> > <4>[   90.856544] lr : msi_set_mask_bit.isra.0+0x34/0x90
> > <4>[   90.856545] sp : ffff800012743bf0
> > <4>[   90.856546] x29: ffff800012743bf0 x28: 0000000000000005
> > <4>[   90.856550] x27: ffff000806260218 x26: ffff8000121c9260
> > <4>[   90.856553] x25: ffff800011d1b000 x24: ffff8000121c9480
> > <4>[   90.856556] x23: ffff80001225a768 x22: ffff000806260260
> > <4>[   90.856559] x21: ffff0008062602dc x20: ffff000806260200
> > <4>[   90.856562] x19: ffff00080286b200 x18: 0000000000000020
> > <4>[   90.856565] x17: 0000000000000001 x16: 0000000000000019
> > <4>[   90.856567] x15: ffff0008023812f8 x14: 6e696e696c66666f
> > <4>[   90.856570] x13: 2072657466612073 x12: 55504320656e696c
> > <4>[   90.856572] x11: 66666f20796c6e6f x10: 20736e6961746e6f
> > <4>[   90.856575] x9 : ffff8000106aa084 x8 : 4920666f20352d33
> > <4>[   90.856578] x7 : 207974696e696666 x6 : 0000000000000000
> > <4>[   90.856581] x5 : 0000000000000000 x4 : 0000000000000000
> > <4>[   90.856584] x3 : 0000000000000001 x2 : ffff800012d6302c
> > <4>[   90.856586] x1 : 0000000000000001 x0 : 0000000000000000
> > <0>[   90.856589] Kernel panic - not syncing: Asynchronous SError Interrupt
> > <4>[   90.856591] CPU: 5 PID: 37 Comm: migration/5 Tainted: G
> > C O      5.10.72-lts-5.10.y+g886272a218dd #1
> > <4>[   90.856592] Hardware name: Freescale i.MX8QM
> > <4>[   90.856594] Call trace:
> > <4>[   90.856595]  dump_backtrace+0x0/0x1b0
> > <4>[   90.856596]  show_stack+0x24/0x30
> > <4>[   90.856597]  dump_stack+0xd0/0x12c
> > <4>[   90.856598]  panic+0x178/0x380
> > <4>[   90.856600]  nmi_panic+0x98/0xa0
> > <4>[   90.856601]  arm64_serror_panic+0x8c/0x98
> > <4>[   90.856602]  do_serror+0x64/0x6c
> > <4>[   90.856603]  el1_error+0x90/0x110
> > <4>[   90.856604]  msi_set_mask_bit.isra.0+0x40/0x90
> > <4>[   90.856605]  pci_msi_mask_irq+0x2c/0x40
> > <4>[   90.856606]  dw_msi_mask_irq+0x24/0x40
> > <4>[   90.856607]  irq_shutdown+0xa4/0xe0
> > <4>[   90.856609]  irq_shutdown_and_deactivate+0x24/0x3c
> > <4>[   90.856610]  irq_migrate_all_off_this_cpu+0x260/0x290
> > <4>[   90.856611]  __cpu_disable+0xd8/0xf0
> > <4>[   90.856612]  take_cpu_down+0x48/0xf0
> > <4>[   90.856613]  multi_cpu_stop+0xb4/0x1a0
> > <4>[   90.856614]  cpu_stopper_thread+0xa0/0x130
> > <4>[   90.856615]  smpboot_thread_fn+0x25c/0x290
> > <4>[   90.856616]  kthread+0x164/0x16c
> > <4>[   90.856617]  ret_from_fork+0x10/0x30
> > <2>[   91.856635] SMP: stopping secondary CPUs
> > <4>[   91.856636] SMP: failed to stop secondary CPUs 0
> > <0>[   91.856637] Kernel Offset: disabled
> > <0>[   91.856639] CPU features: 0x0240022,2100600c
> > <0>[   91.856640] Memory Limit: none
> > ```
> >
> > Note that the exception occurs right around when we're masking MSIs,
> > which happens because CPU5 goes offline and the kernel recognizes that
> > there are no more CPUs left to handle those interrupts. To be a little
> > more specific, the PC indicates that the CPU was executing a data
> > memory barrier when the exception arrived. Source here [1] and
> > disassembly below with my annotation showing the PC when the exception
> > hit:
> >
> > ```
> >        1b64:       b9005260        str     w0, [x19, #80]
> >            asm volatile(ALTERNATIVE("ldr %w0, [%1]",
> >        1b68:       f9403260        ldr     x0, [x19, #96]
> >        1b6c:       b9400000        ldr     w0, [x0]
> >                    readl(desc->mask_base);         /* Flush write to
> > device */
> >        1b70:       d5033dbf        dmb     ld      <--- PC during exception
> >        1b74:       2a0003e0        mov     w0, w0
> >        1b78:       ca000000        eor     x0, x0, x0
> > ```
> >
> > At the point of the exception I believe that the PCIe controller is
> > powered down, although I haven't confirmed.
>
> Does the pcie-imx6 driver you are using has the PM support? On the Qcom

Yes, the pci-imx6 driver installs noirq suspend and resume hooks. In
the suspend hook the PCIe-related clocks are disabled.

> platforms, we are facing a similar issue where we powerdown the PCIe
> controller in the suspend callback of the driver and hitting the SError
> while the kernel tries to mask the MSI during offlining of the boot CPU.
>
> For working around this issue, we used syscore_ops() suspend/resume
> callbacks in the Qcom PCIe driver.

I'm reading your patches from "PCI: qcom: Add system suspend and
resume support" and they do look promising. I'll try to port them to
the pci-imx6 driver, at which point I can help test as the patches are
revised.

In that patch series you said:
"Its observed that access to Ep PCIe space to mask MSI/MSIX is happening
at the very late stage of suspend path"

Just as a point of clarification for me, when you say "Ep PCIe space"
are you talking about the _device_ endpoint on the other side of the
link from the host? Or the endpoint on the host side?

>
> One might argue that we should not power down the PCIe controller during
> suspend, but that's not going to be super efficient in power constraint
> devices like mobile phones, laptops...
>
> >
> > - - -
> >
> > I'm trying to understand what's going wrong but I've hit a wall. I
> > thought the act of writing to the MSI_MASK bit while the HC is powered
> > down is what caused the issue, but I hacked a fix to avoid calling
> > `msi_set_mask_bit()` during suspend-to-RAM and still got the
> > exception.
> >
> > At this point I'm wondering why we mask MSIs so late in the suspend
> > process (right when the last non-boot CPU is taken offline). Shouldn't
> > we disable/mask these IRQs as part of host controller suspend?
> >
> > I'm also wondering if maybe the PCIe _device_ - the SSD - is writing
> > into host controller memory after the HC is powered down, which could
> > cause an exception on the data bus that would obviously be
> > asynchronous to the CPU. But I've just started learning about PCIe and
> > I have a very fuzzy understanding of how data flows during an MSI.
> >
>
> Well, probability is very low. During testing on our platforms, we
> found that the NVMe device was still transmitting messages (don't remember
> which one atm) even after suspending it from the NVMe driver.
>
> But that should not cause any SError on the host CPU unless the PCIe
> device writes to the unclocked memory. Usually, the PCIe controller/phy
> will terminate those messages if the controller is powered down.
>
> Thanks,
> Mani
>
> > Any advice would be welcome.
> >
> > Thanks,
> > Kevin
> >
> > [1] https://source.codeaurora.org/external/imx/linux-imx/tree/drivers/pci/msi.c?h=lf-5.10.y#n243

Thanks,
Kevin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ARMv8 SError, panic around msi_set_mask_bit during suspend-to-RAM
  2022-09-29 20:07     ` Kevin Rowland
@ 2022-09-30 12:53       ` Marc Zyngier
  0 siblings, 0 replies; 7+ messages in thread
From: Marc Zyngier @ 2022-09-30 12:53 UTC (permalink / raw)
  To: Kevin Rowland
  Cc: Bjorn Helgaas, linux-pci, Thomas Gleixner, Manivannan Sadhasivam,
	Krishna Chaitanya Chundru

On Thu, 29 Sep 2022 21:07:50 +0100,
Kevin Rowland <kevin.p.rowland@gmail.com> wrote:
> 
> On Thu, Sep 29, 2022 at 2:08 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Wed, 28 Sep 2022 17:56:01 -0400,
> > Bjorn Helgaas <helgaas@kernel.org> wrote:
> > >
> > > > At this point I'm wondering why we mask MSIs so late in the suspend
> > > > process (right when the last non-boot CPU is taken offline). Shouldn't
> > > > we disable/mask these IRQs as part of host controller suspend?
> >
> > This is what we do for *all* interrupts. The question is really
> > whether there is any form of PM involved when accessing the device
> > implementing the masking.
> I'm afraid I didn't understand this part. What is it that we do for
> all interrupts? Mask them when the last CPU in their affinity mask is
> taken offline, or mask them as part of the suspend process of the
> owning device?

Sorry, I wasn't clear at all: *managed* interrupts (which are per-CPU
interrupts) are disabled when the corresponding CPU goes down. If you
have some extra logic to pull the plug on the RC when the last CPU
goes away, it must happen after this disabling phase.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-09-30 12:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-28 19:01 ARMv8 SError, panic around msi_set_mask_bit during suspend-to-RAM Kevin Rowland
2022-09-28 21:56 ` Bjorn Helgaas
2022-09-29  9:08   ` Marc Zyngier
2022-09-29 20:07     ` Kevin Rowland
2022-09-30 12:53       ` Marc Zyngier
2022-09-29 10:07 ` Manivannan Sadhasivam
2022-09-29 20:22   ` Kevin Rowland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).