linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] PCI/MSI: Warning observed for NVMe with ACPI
@ 2021-12-10 10:48 Jon Hunter
  2021-12-10 11:39 ` Marc Zyngier
  0 siblings, 1 reply; 4+ messages in thread
From: Jon Hunter @ 2021-12-10 10:48 UTC (permalink / raw)
  To: Bjorn Helgaas, lorenzo.pieralisi, Marc Zyngier; +Cc: linux-pci, linux-tegra

Hi all,

Since Linux v5.13, we have noticed that following warning splat when
booting Tegra (ARM64) with ACPI ...

[    2.725479] WARNING: CPU: 0 PID: 94 at include/linux/msi.h:264 free_msi_irqs+0x84/0x188
[    2.736137] Modules linked in:
[    2.736147] CPU: 0 PID: 94 Comm: kworker/u16:1 Tainted: G        W         5.12.0-rc2-00008-g658376bd3e5-dirty #36
[    2.736160] Workqueue: nvme-reset-wq nvme_reset_work
[    2.746470] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--)
[    2.757713] pc : free_msi_irqs+0x84/0x188
[    2.757726] lr : __pci_enable_msix_range+0x380/0x530
[    2.757735] sp : ffff800012813b00
[    2.757739] x29: ffff800012813b00
[    2.768371] x28: 00000000ffffffed
[    2.768382] x27: 0000000000000001 x26: 0000000000000000
[    2.768393] x25: ffff0000809362e8 x24: 0000000000000000
[    2.768407] x23: 000000000000000c x22: ffff000080936000
[    2.768418] x21: ffff0000809362e8 x20: ffff0000809362e8
[    2.775320] x19: ffff000080936000
[    2.785950] x18: ffffffffffffffff
[    2.785961] x17: 0000000000000007 x16: 0000000000000001
[    2.785975] x15: ffff800011bf9948
[    2.793997] x14: ffff8000928137e7
[    2.794009] x13: ffff8000128137f5 x12: ffff800011c19640
[    2.794023] x11: fffffffffffe5788 x10: 0000000005f5e0ff
[    2.794034] x9 : 00000000ffffffd0 x8 : 203a737542204f49
[    2.803737] x7 : 444d206465786946 x6 : ffff800011ee1fd7
[    2.803750] x5 : 0000000000000000 x4 : 0000000000000000
[    2.815286] x3 : 00000000ffffffff x2 : ffff0000809362e8
[    2.815300] x1 : ffff0000809362e8 x0 : 0000000000000000
[    2.825270] Call trace:
[    2.825275]  free_msi_irqs+0x84/0x188
[    2.825288]  __pci_enable_msix_range+0x380/0x530
[    2.825299]  pci_alloc_irq_vectors_affinity+0x158/0x168
[    2.825309]  nvme_reset_work+0x214/0x15b8
[    2.829340] dwc-eth-dwmac NVDA1160:00: SPH feature enabled
[    2.832986]  process_one_work+0x1cc/0x360
[    2.833002]  worker_thread+0x48/0x450
[    2.833012]  kthread+0x120/0x150
[    2.833020]  ret_from_fork+0x10/0x18


Bisecting this I found that started to occur because with Linux v5.13,
CONFIG_PCI_MSI_ARCH_FALLBACKS was no longer enabled by default and only
happened to be enabled because Renesas R-Car was enabling it.

When booting with ACPI, I see that when pci_msi_setup_msi_irqs() is
called, it ends up calling arch_setup_msi_irqs() and if
CONFIG_PCI_MSI_ARCH_FALLBACKS  is not enabled, then this will call
WARN_ON_ONCE(1).

So the question is, should this be enabled by default for ARM64? I see
a lot of other architectures enabling this when PCI_MSI is enabled. So
I am wondering if we should be doing something like ...

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1f212b47a48a..4bbd81bab809 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -202,6 +202,7 @@ config ARM64
         select PCI_DOMAINS_GENERIC if PCI
         select PCI_ECAM if (ACPI && PCI)
         select PCI_SYSCALL if PCI
+       select PCI_MSI_ARCH_FALLBACKS if PCI_MSI
         select POWER_RESET
         select POWER_SUPPLY
         select SPARSE_IRQ

Cheers
Jon

-- 
nvpublic

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC] PCI/MSI: Warning observed for NVMe with ACPI
  2021-12-10 10:48 [RFC] PCI/MSI: Warning observed for NVMe with ACPI Jon Hunter
@ 2021-12-10 11:39 ` Marc Zyngier
  2021-12-10 12:25   ` Jon Hunter
  2021-12-11  9:50   ` Thomas Gleixner
  0 siblings, 2 replies; 4+ messages in thread
From: Marc Zyngier @ 2021-12-10 11:39 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Bjorn Helgaas, lorenzo.pieralisi, Thomas Gleixner, linux-pci,
	linux-tegra

Hi Jon,

On Fri, 10 Dec 2021 10:48:22 +0000,
Jon Hunter <jonathanh@nvidia.com> wrote:
> 
> Hi all,
> 
> Since Linux v5.13, we have noticed that following warning splat when
> booting Tegra (ARM64) with ACPI ...
> 
> [    2.725479] WARNING: CPU: 0 PID: 94 at include/linux/msi.h:264 free_msi_irqs+0x84/0x188
> [    2.736137] Modules linked in:
> [    2.736147] CPU: 0 PID: 94 Comm: kworker/u16:1 Tainted: G        W         5.12.0-rc2-00008-g658376bd3e5-dirty #36
> [    2.736160] Workqueue: nvme-reset-wq nvme_reset_work
> [    2.746470] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--)
> [    2.757713] pc : free_msi_irqs+0x84/0x188
> [    2.757726] lr : __pci_enable_msix_range+0x380/0x530
> [    2.757735] sp : ffff800012813b00
> [    2.757739] x29: ffff800012813b00
> [    2.768371] x28: 00000000ffffffed
> [    2.768382] x27: 0000000000000001 x26: 0000000000000000
> [    2.768393] x25: ffff0000809362e8 x24: 0000000000000000
> [    2.768407] x23: 000000000000000c x22: ffff000080936000
> [    2.768418] x21: ffff0000809362e8 x20: ffff0000809362e8
> [    2.775320] x19: ffff000080936000
> [    2.785950] x18: ffffffffffffffff
> [    2.785961] x17: 0000000000000007 x16: 0000000000000001
> [    2.785975] x15: ffff800011bf9948
> [    2.793997] x14: ffff8000928137e7
> [    2.794009] x13: ffff8000128137f5 x12: ffff800011c19640
> [    2.794023] x11: fffffffffffe5788 x10: 0000000005f5e0ff
> [    2.794034] x9 : 00000000ffffffd0 x8 : 203a737542204f49
> [    2.803737] x7 : 444d206465786946 x6 : ffff800011ee1fd7
> [    2.803750] x5 : 0000000000000000 x4 : 0000000000000000
> [    2.815286] x3 : 00000000ffffffff x2 : ffff0000809362e8
> [    2.815300] x1 : ffff0000809362e8 x0 : 0000000000000000
> [    2.825270] Call trace:
> [    2.825275]  free_msi_irqs+0x84/0x188
> [    2.825288]  __pci_enable_msix_range+0x380/0x530
> [    2.825299]  pci_alloc_irq_vectors_affinity+0x158/0x168
> [    2.825309]  nvme_reset_work+0x214/0x15b8
> [    2.829340] dwc-eth-dwmac NVDA1160:00: SPH feature enabled
> [    2.832986]  process_one_work+0x1cc/0x360
> [    2.833002]  worker_thread+0x48/0x450
> [    2.833012]  kthread+0x120/0x150
> [    2.833020]  ret_from_fork+0x10/0x18
> 
> 
> Bisecting this I found that started to occur because with Linux v5.13,
> CONFIG_PCI_MSI_ARCH_FALLBACKS was no longer enabled by default and only
> happened to be enabled because Renesas R-Car was enabling it.
> 
> When booting with ACPI, I see that when pci_msi_setup_msi_irqs() is
> called, it ends up calling arch_setup_msi_irqs() and if
> CONFIG_PCI_MSI_ARCH_FALLBACKS  is not enabled, then this will call
> WARN_ON_ONCE(1).
> 
> So the question is, should this be enabled by default for ARM64? I see
> a lot of other architectures enabling this when PCI_MSI is enabled. So
> I am wondering if we should be doing something like ...
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 1f212b47a48a..4bbd81bab809 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -202,6 +202,7 @@ config ARM64
>         select PCI_DOMAINS_GENERIC if PCI
>         select PCI_ECAM if (ACPI && PCI)
>         select PCI_SYSCALL if PCI
> +       select PCI_MSI_ARCH_FALLBACKS if PCI_MSI
>         select POWER_RESET
>         select POWER_SUPPLY
>         select SPARSE_IRQ

+Thomas, as he's neck-deep in the MSI rework.

No, this definitely is the wrong solution.

arm64 doesn't need any arch fallback (I actually went out of my way to
kill them on this architecture), and requires the individual MSI
controller drivers to do the right thing by using MSI domains.  Adding
this config option makes the warning disappear, but the core issue is
that you have a device that doesn't have a MSI domain associated with
it.

So either your device isn't MSI capable (odd), your host bridge
doesn't make the link with the MSI controller to advertise the MSI
domain (this should normally be dealt with via IORT), or there is a
bug of a similar sort somewhere else.

Getting to the root of this issue would be the right thing to do.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] PCI/MSI: Warning observed for NVMe with ACPI
  2021-12-10 11:39 ` Marc Zyngier
@ 2021-12-10 12:25   ` Jon Hunter
  2021-12-11  9:50   ` Thomas Gleixner
  1 sibling, 0 replies; 4+ messages in thread
From: Jon Hunter @ 2021-12-10 12:25 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Bjorn Helgaas, lorenzo.pieralisi, Thomas Gleixner, linux-pci,
	linux-tegra, Vidya Sagar

Hi Marc,

On 10/12/2021 11:39, Marc Zyngier wrote:

...

>> So the question is, should this be enabled by default for ARM64? I see
>> a lot of other architectures enabling this when PCI_MSI is enabled. So
>> I am wondering if we should be doing something like ...
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 1f212b47a48a..4bbd81bab809 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -202,6 +202,7 @@ config ARM64
>>          select PCI_DOMAINS_GENERIC if PCI
>>          select PCI_ECAM if (ACPI && PCI)
>>          select PCI_SYSCALL if PCI
>> +       select PCI_MSI_ARCH_FALLBACKS if PCI_MSI
>>          select POWER_RESET
>>          select POWER_SUPPLY
>>          select SPARSE_IRQ
> 
> +Thomas, as he's neck-deep in the MSI rework.
> 
> No, this definitely is the wrong solution.
> 
> arm64 doesn't need any arch fallback (I actually went out of my way to
> kill them on this architecture), and requires the individual MSI
> controller drivers to do the right thing by using MSI domains.  Adding
> this config option makes the warning disappear, but the core issue is
> that you have a device that doesn't have a MSI domain associated with
> it.
> 
> So either your device isn't MSI capable (odd), your host bridge
> doesn't make the link with the MSI controller to advertise the MSI
> domain (this should normally be dealt with via IORT), or there is a
> bug of a similar sort somewhere else.
> 
> Getting to the root of this issue would be the right thing to do.


Thanks! I will chat with Sagar about this and see what we are missing.

Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] PCI/MSI: Warning observed for NVMe with ACPI
  2021-12-10 11:39 ` Marc Zyngier
  2021-12-10 12:25   ` Jon Hunter
@ 2021-12-11  9:50   ` Thomas Gleixner
  1 sibling, 0 replies; 4+ messages in thread
From: Thomas Gleixner @ 2021-12-11  9:50 UTC (permalink / raw)
  To: Marc Zyngier, Jon Hunter
  Cc: Bjorn Helgaas, lorenzo.pieralisi, linux-pci, linux-tegra

On Fri, Dec 10 2021 at 11:39, Marc Zyngier wrote:
> On Fri, 10 Dec 2021 10:48:22 +0000,
> Jon Hunter <jonathanh@nvidia.com> wrote:
>> +       select PCI_MSI_ARCH_FALLBACKS if PCI_MSI
>>         select POWER_RESET
>>         select POWER_SUPPLY
>>         select SPARSE_IRQ
>
> +Thomas, as he's neck-deep in the MSI rework.
>
> No, this definitely is the wrong solution.

Correct.

> arm64 doesn't need any arch fallback (I actually went out of my way to
> kill them on this architecture), and requires the individual MSI
> controller drivers to do the right thing by using MSI domains.  Adding
> this config option makes the warning disappear, but the core issue is
> that you have a device that doesn't have a MSI domain associated with
> it.
>
> So either your device isn't MSI capable (odd), your host bridge
> doesn't make the link with the MSI controller to advertise the MSI
> domain (this should normally be dealt with via IORT), or there is a
> bug of a similar sort somewhere else.

What's even more odd is:

>> [    2.725479] WARNING: CPU: 0 PID: 94 at include/linux/msi.h:264 free_msi_irqs+0x84/0x188
>> [    2.825275]  free_msi_irqs+0x84/0x188
>> [    2.825288]  __pci_enable_msix_range+0x380/0x530
>> [    2.825299]  pci_alloc_irq_vectors_affinity+0x158/0x168

From __pci_enable_msix_range() there are two ways to reach free_msi_irqs():

1) pci_alloc_irq_vectors_affinity()
     __pci_enable_msix_range()
       __pci_enable_msix()
         msix_capability_init()
           msix_setup_entries()
             alloc_msi_entry(()      -> allocation fail

2) pci_alloc_irq_vectors_affinity()
     __pci_enable_msix_range()
       __pci_enable_msix()
         msix_capability_init()
           pci_msi_setup_msi_irqs(); -> any failure after this succeeded

#1 is unlikely

#2 is odd because if the irqdomain of the device is not hierarchical,
   then the same warning should trigger already in
   pci_msi_setup_msi_irqs() via arch_setup_msi_irqs().

Strange.

Thanks,

        tglx




^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-12-11  9:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-10 10:48 [RFC] PCI/MSI: Warning observed for NVMe with ACPI Jon Hunter
2021-12-10 11:39 ` Marc Zyngier
2021-12-10 12:25   ` Jon Hunter
2021-12-11  9:50   ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).