linux-next.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
       [not found] ` <20200826112333.992429909@linutronix.de>
@ 2020-09-25 13:54   ` Qian Cai
  2020-09-26 12:38     ` Vasily Gorbik
  0 siblings, 1 reply; 7+ messages in thread
From: Qian Cai @ 2020-09-25 13:54 UTC (permalink / raw)
  To: Thomas Gleixner, LKML, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, linux-s390, Stephen Rothwell, linux-next
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On Wed, 2020-08-26 at 13:17 +0200, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The arch_.*_msi_irq[s] fallbacks are compiled in whether an architecture
> requires them or not. Architectures which are fully utilizing hierarchical
> irq domains should never call into that code.
> 
> It's not only architectures which depend on that by implementing one or
> more of the weak functions, there is also a bunch of drivers which relies
> on the weak functions which invoke msi_controller::setup_irq[s] and
> msi_controller::teardown_irq.
> 
> Make the architectures and drivers which rely on them select them in Kconfig
> and if not selected replace them by stub functions which emit a warning and
> fail the PCI/MSI interrupt allocation.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Today's linux-next will have some warnings on s390x:

.config: https://gitlab.com/cailca/linux-mm/-/blob/master/s390.config

WARNING: unmet direct dependencies detected for PCI_MSI_ARCH_FALLBACKS
  Depends on [n]: PCI [=n]
  Selected by [y]:
  - S390 [=y]

WARNING: unmet direct dependencies detected for PCI_MSI_ARCH_FALLBACKS
  Depends on [n]: PCI [=n]
  Selected by [y]:
  - S390 [=y]

> ---
> V2: Make the architectures (and drivers) which need the fallbacks select them
>     and not the other way round (Bjorn).
> ---
>  arch/ia64/Kconfig              |    1 +
>  arch/mips/Kconfig              |    1 +
>  arch/powerpc/Kconfig           |    1 +
>  arch/s390/Kconfig              |    1 +
>  arch/sparc/Kconfig             |    1 +
>  arch/x86/Kconfig               |    1 +
>  drivers/pci/Kconfig            |    3 +++
>  drivers/pci/controller/Kconfig |    3 +++
>  drivers/pci/msi.c              |    3 ++-
>  include/linux/msi.h            |   31 ++++++++++++++++++++++++++-----
>  10 files changed, 40 insertions(+), 6 deletions(-)
> 
> --- a/arch/ia64/Kconfig
> +++ b/arch/ia64/Kconfig
> @@ -56,6 +56,7 @@ config IA64
>  	select NEED_DMA_MAP_STATE
>  	select NEED_SG_DMA_LENGTH
>  	select NUMA if !FLATMEM
> +	select PCI_MSI_ARCH_FALLBACKS
>  	default y
>  	help
>  	  The Itanium Processor Family is Intel's 64-bit successor to
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -86,6 +86,7 @@ config MIPS
>  	select MODULES_USE_ELF_REL if MODULES
>  	select MODULES_USE_ELF_RELA if MODULES && 64BIT
>  	select PERF_USE_VMALLOC
> +	select PCI_MSI_ARCH_FALLBACKS
>  	select RTC_LIB
>  	select SYSCTL_EXCEPTION_TRACE
>  	select VIRT_TO_BUS
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -246,6 +246,7 @@ config PPC
>  	select OLD_SIGACTION			if PPC32
>  	select OLD_SIGSUSPEND
>  	select PCI_DOMAINS			if PCI
> +	select PCI_MSI_ARCH_FALLBACKS
>  	select PCI_SYSCALL			if PCI
>  	select PPC_DAWR				if PPC64
>  	select RTC_LIB
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -185,6 +185,7 @@ config S390
>  	select OLD_SIGSUSPEND3
>  	select PCI_DOMAINS		if PCI
>  	select PCI_MSI			if PCI
> +	select PCI_MSI_ARCH_FALLBACKS
>  	select SPARSE_IRQ
>  	select SYSCTL_EXCEPTION_TRACE
>  	select THREAD_INFO_IN_TASK
> --- a/arch/sparc/Kconfig
> +++ b/arch/sparc/Kconfig
> @@ -43,6 +43,7 @@ config SPARC
>  	select GENERIC_STRNLEN_USER
>  	select MODULES_USE_ELF_RELA
>  	select PCI_SYSCALL if PCI
> +	select PCI_MSI_ARCH_FALLBACKS
>  	select ODD_RT_SIGACTION
>  	select OLD_SIGSUSPEND
>  	select CPU_NO_EFFICIENT_FFS
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -225,6 +225,7 @@ config X86
>  	select NEED_SG_DMA_LENGTH
>  	select PCI_DOMAINS			if PCI
>  	select PCI_LOCKLESS_CONFIG		if PCI
> +	select PCI_MSI_ARCH_FALLBACKS
>  	select PERF_EVENTS
>  	select RTC_LIB
>  	select RTC_MC146818_LIB
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -56,6 +56,9 @@ config PCI_MSI_IRQ_DOMAIN
>  	depends on PCI_MSI
>  	select GENERIC_MSI_IRQ_DOMAIN
>  
> +config PCI_MSI_ARCH_FALLBACKS
> +	bool
> +
>  config PCI_QUIRKS
>  	default y
>  	bool "Enable PCI quirk workarounds" if EXPERT
> --- a/drivers/pci/controller/Kconfig
> +++ b/drivers/pci/controller/Kconfig
> @@ -41,6 +41,7 @@ config PCI_TEGRA
>  	bool "NVIDIA Tegra PCIe controller"
>  	depends on ARCH_TEGRA || COMPILE_TEST
>  	depends on PCI_MSI_IRQ_DOMAIN
> +	select PCI_MSI_ARCH_FALLBACKS
>  	help
>  	  Say Y here if you want support for the PCIe host controller found
>  	  on NVIDIA Tegra SoCs.
> @@ -67,6 +68,7 @@ config PCIE_RCAR_HOST
>  	bool "Renesas R-Car PCIe host controller"
>  	depends on ARCH_RENESAS || COMPILE_TEST
>  	depends on PCI_MSI_IRQ_DOMAIN
> +	select PCI_MSI_ARCH_FALLBACKS
>  	help
>  	  Say Y here if you want PCIe controller support on R-Car SoCs in host
>  	  mode.
> @@ -103,6 +105,7 @@ config PCIE_XILINX_CPM
>  	bool "Xilinx Versal CPM host bridge support"
>  	depends on ARCH_ZYNQMP || COMPILE_TEST
>  	select PCI_HOST_COMMON
> +	select PCI_MSI_ARCH_FALLBACKS
>  	help
>  	  Say 'Y' here if you want kernel support for the
>  	  Xilinx Versal CPM host bridge.
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -58,8 +58,8 @@ static void pci_msi_teardown_msi_irqs(st
>  #define pci_msi_teardown_msi_irqs	arch_teardown_msi_irqs
>  #endif
>  
> +#ifdef CONFIG_PCI_MSI_ARCH_FALLBACKS
>  /* Arch hooks */
> -
>  int __weak arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc)
>  {
>  	struct msi_controller *chip = dev->bus->msi;
> @@ -132,6 +132,7 @@ void __weak arch_teardown_msi_irqs(struc
>  {
>  	return default_teardown_msi_irqs(dev);
>  }
> +#endif /* CONFIG_PCI_MSI_ARCH_FALLBACKS */
>  
>  static void default_restore_msi_irq(struct pci_dev *dev, int irq)
>  {
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -193,17 +193,38 @@ void pci_msi_mask_irq(struct irq_data *d
>  void pci_msi_unmask_irq(struct irq_data *data);
>  
>  /*
> - * The arch hooks to setup up msi irqs. Those functions are
> - * implemented as weak symbols so that they /can/ be overriden by
> - * architecture specific code if needed.
> + * The arch hooks to setup up msi irqs. Default functions are implemented
> + * as weak symbols so that they /can/ be overriden by architecture specific
> + * code if needed. These hooks must be enabled by the architecture or by
> + * drivers which depend on them via msi_controller based MSI handling.
> + *
> + * If CONFIG_PCI_MSI_ARCH_FALLBACKS is not selected they are replaced by
> + * stubs with warnings.
>   */
> +#ifdef CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS
>  int arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc);
>  void arch_teardown_msi_irq(unsigned int irq);
>  int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
>  void arch_teardown_msi_irqs(struct pci_dev *dev);
> -void arch_restore_msi_irqs(struct pci_dev *dev);
> -
>  void default_teardown_msi_irqs(struct pci_dev *dev);
> +#else
> +static inline int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int
> type)
> +{
> +	WARN_ON_ONCE(1);
> +	return -ENODEV;
> +}
> +
> +static inline void arch_teardown_msi_irqs(struct pci_dev *dev)
> +{
> +	WARN_ON_ONCE(1);
> +}
> +#endif
> +
> +/*
> + * The restore hooks are still available as they are useful even
> + * for fully irq domain based setups. Courtesy to XEN/X86.
> + */
> +void arch_restore_msi_irqs(struct pci_dev *dev);
>  void default_restore_msi_irqs(struct pci_dev *dev);
>  
>  struct msi_controller {
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI
       [not found] <20200826111628.794979401@linutronix.de>
       [not found] ` <20200826112333.992429909@linutronix.de>
@ 2020-09-25 15:29 ` Qian Cai
  2020-09-25 15:49   ` Peter Zijlstra
  1 sibling, 1 reply; 7+ messages in thread
From: Qian Cai @ 2020-09-25 15:29 UTC (permalink / raw)
  To: Thomas Gleixner, LKML, Stephen Rothwell, linux-next
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On Wed, 2020-08-26 at 13:16 +0200, Thomas Gleixner wrote:
> This is the second version of providing a base to support device MSI (non
> PCI based) and on top of that support for IMS (Interrupt Message Storm)
> based devices in a halfways architecture independent way.
> 
> The first version can be found here:
> 
>     https://lore.kernel.org/r/20200821002424.119492231@linutronix.de
> 
> It's still a mixed bag of bug fixes, cleanups and general improvements
> which are worthwhile independent of device MSI.

Reverting the part of this patchset on the top of today's linux-next fixed an
boot issue on HPE ProLiant DL560 Gen10, i.e.,

$ git revert --no-edit 13b90cadfc29..bc95fd0d7c42

.config: https://gitlab.com/cailca/linux-mm/-/blob/master/x86.config

It looks like the crashes happen in the interrupt remapping code where they are
only able to to generate partial call traces.

[    1.912386][    T0] ACPI: X2APIC_NMI (uid[0xf5] high level 9983][    T0] ... MAX_LOCK_DEPTH:          48
[    7.914876][    T0] ... MAX_LOCKDEP_KEYS:        8192
[    7.919942][    T0] ... CLASSHASH_SIZE:          4096
[    7.925009][    T0] ... MAX_LOCKDEP_ENTRIES:     32768
[    7.930163][    T0] ... MAX_LOCKDEP_CHAINS:      65536
[    7.935318][    T0] ... CHAINHASH_SIZE:          32768
[    7.940473][    T0]  memory used by lock dependency info: 6301 kB
[    7.946586][    T0]  memory used for stack traces: 4224 kB
[    7.952088][    T0]  per task-struct memory footprint: 1920 bytes
[    7.968312][    T0] mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
[    7.980281][    T0] ACPI: Core revision 20200717
[    7.993343][    T0] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 79635855245 ns
[    8.003270][    T0] APIC: Switch to symmetric I/O mode setup
[    8.008951][    T0] DMAR: Host address width 46
[    8.013512][    T0] DMAR: DRHD base: 0x000000e5ffc000 flags: 0x0
[    8.019680][    T0] DMAR: dmar0: reg_base_addr e5ffc000 ver 1:0 cap 8d2078c106f0466 [    T0] DMAR-IR: IOAPIC id 15 under DRHD base  0xe5ffc000 IOMMU 0
[    8.420990][    T0] DMAR-IR: IOAPIC id 8 under DRHD base  0xddffc000 IOMMU 15
[    8.428166][    T0] DMAR-IR: IOAPIC id 9 under DRHD base  0xddffc000 IOMMU 15
[    8.435341][    T0] DMAR-IR: HPET id 0 under DRHD base 0xddffc000
[    8.441456][    T0] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    8.457911][    T0] DMAR-IR: Enabled IRQ remapping in x2apic mode
[    8.466614][    T0] BUG: kernel NULL pointer dereference, address: 0000000000000000
[    8.474295][    T0] #PF: supervisor instruction fetch in kernel mode
[    8.480669][    T0] #PF: error_code(0x0010) - not-present page
[    8.486518][    T0] PGD 0 P4D 0 
[    8.489757][    T0] Oops: 0010 [#1] SMP KASAN PTI
[    8.494476][    T0] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G          I       5.9.0-rc6-next-20200925 #2
[    8.503987][    T0] Hardware name: HPE ProLiant DL560 Gen10/ProLiant DL560 Gen10, BIOS U34 11/13/2019
[    8.513238][    T0] RIP: 0010:0x0
[    8.516562][    T0] Code: Bad RIP v

or

[    2.906744][    T0] ACPI: X2API32, address 0xfec68000, GSI 128-135
[    2.907063][    T0] IOAPIC[15]: apic_id 29, version 32, address 0xfec70000, GSI 136-143
[    2.907071][    T0] IOAPIC[16]: apic_id 30, version 32, address 0xfec78000, GSI 144-151
[    2.907079][    T0] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    2.907084][    T0] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    2.907100][    T0] Using ACPI (MADT) for SMP configuration information
[    2.907105][    T0] ACPI: HPET id: 0x8086a701 base: 0xfed00000
[    2.907116][    T0] ACPI: SPCR: console: uart,mmio,0x0,115200
[    2.907121][    T0] TSC deadline timer available
[    2.907126][    T0] smpboot: Allowing 144 CPUs, 0 hotplug CPUs
[    2.907163][    T0] [mem 0xd0000000-0xfdffffff] available for PCI devices
[    2.907175][    T0] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    2.914541][    T0] setup_percpu: NR_CPUS:256 nr_cpumask_bits:144 nr_cpu_ids:144 nr_node_ids:4
[    2.926109][   466 ecap f020df
[    9.134709][    T0] DMAR: DRHD base: 0x000000f5ffc000 flags: 0x0
[    9.140867][    T0] DMAR: dmar8: reg_base_addr f5ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    9.149610][    T0] DMAR: DRHD base: 0x000000f7ffc000 flags: 0x0
[    9.155762][    T0] DMAR: dmar9: reg_base_addr f7ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    9.164491][    T0] DMAR: DRHD base: 0x000000f9ffc000 flags: 0x0
[    9.170645][    T0] DMAR: dmar10: reg_base_addr f9ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    9.179476][    T0] DMAR: DRHD base: 0x000000fbffc000 flags: 0x0
[    9.185626][    T0] DMAR: dmar11: reg_base_addr fbffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    9.194442][    T0] DMAR: DRHD base: 0x000000dfffc000 flags: 0x0
[    9.200587][    T0] DMAR: dmar12: reg_base_addr dfffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    9.209418][    T0] DMAR: DRHD base: 0x000000e1ffc000 flags: 0x0
[    9.215551][    T0] DMAR: dmar13: reg_base_addr e1ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    9.224367][    T0] DMAR: DRHD base: 0x000000e3ffc83][    T0]  msi_domain_alloc+0x8e/0x280
[    9.615015][    T0]  __irq_domain_a8992cd
[    9.711906][    T0] R10: ffffffff85407d78 R11: fffffbfff18992cc R12: ffffffff8546ffc0
[    9.719761][    T0] R13: 0000000000000098 R14: ffff888106e63a40 R15: 0000000000000001
[    9.727617][    T0] FS:  0000000000000000(0000) GS:ffff8887df800000(0000) knlGS:0000000000000000
[    9.736431][    T0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    9.742892][    T0] CR2: ffffffffffffffd6 CR3: 0000001ba7814001 CR4: 00000000000606b0
[    9.750747][    T0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    9.758601][    T0] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    9.766456][    T0] Kernel panic - not syncing: Fatal exception
[    9.772547][    T0] ---[ end Kernel panic - not syncing: Fatal exception ]---

The working boot (without those patches) looks like this:

[    1.913963][    T0] ACPI: X2APIC_NMI (uid[0xf4] high level lint[0x1])
[    1.913967][    T0] ACPI: X2APIC_NMI (uid[0xf5] high level lint[0x1])
[    1.913970][    T0] ACPI: X2APIC_NMI (uid[0xf6] high level lint[0x1])
[    1.913974][    T0] ACPI: X2APIC_NMI (uid[0xf7] high level lint[0x1])
[    1.914017][    T0] IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23
[    1.914032][    T0] IOAPIC[1]: apic_id 9, version 32, address 0xfec01000, GSI 24-31
[    1.914039][    T0] IOAPIC[2]: apic_id 10, version 32, address 0xfec08000, GSI 32-39
[    1.914047][    T0] IOAPIC[3]: apic_id 11, version 32, address 0xfec10000, GSI 40-47
[    1.914054][    T0] IOAPIC[4]: apic_id 12, version 32, address 0xfec18000, GSI 48-55
[    1.914062][    T0] IOAPIC[5]: apic_id 15, version 32, address 0xfec20000, GSI 56-63
[    1.[    7.994567][    T0] mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
[    8.006541][    T0] ACPI: Core revision 20200717
[    8.019713][    T0] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 79635855245 ns
[    8.029672][    T0] APIC: Switch to symmetric I/O mode setup
[    8.035354][    T0] DMAR: Host address width 46
[    8.039915][    T0] DMAR: DRHD base: 0x000000e5ffc000 flags: 0x0
[    8.046095][    T0] DMAR: dmar0: reg_base_addr e5ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    8.054840][    T0] DMAR: DRHD base: 0x000000e7ffc000 flags: 0x0
[    8.060997][    T0] DMAR: dmar1: reg_base_addr e7ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    8.069740][    T0] DMAR: DRHD base: 0x000000e9ffc000 flags: 0x0
[    8.075872][    T0] DMAR: dmar2: reg_base_addr e9ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    8.084615][    T0] DMAR: DRHD base: 0x000000ebffc000 flags: 0x0
[    8.090761][    T0] DMAR: dmar3: reg_base_addr ebffc000 ver 1:0 cap 8d2078c106f0466 ecap fMAR-IR: Enabled IRQ remapping in x2apic mode
[    8.513491][    T0] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    8.568289][    T0] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x2b3e459bf4c, max_idle_ns: 440795289890 ns
[    8.579576][    T0] Calibrating delay loop (skipped), value calculated using timer frequency.. 6000.00 BogoMIPS (lpj=30000000)
[    8.589574][    T0] pid_max: default: 147456 minimum: 1152
[    8.714025][    T0] efi: memattr: Entry attributes invalid: RO and XP bits both cleared
[    8.719577][    T0] efi: memattr: ! 0x0000a057a000-0x0000a05b4fff [Runtime Code       |RUN|  |  |  |  |  |  |  |   |  |  |  |  ]
[    8.775355][    T0] Dentry cache hash table entries: 8388608 (order: 14, 67108864 bytes, vmalloc)
[    8.798868][    T0] Inode-cache hash table entries: 4194304 (order: 13, 33554432 bytes, vmalloc)
[    8.811550][    T0] Mount-cache hash table entries: 131072 (order: 8, 1048576 bytes, vmalloc)
[    8.820076][    T0] Mountpoint-cache hash table entries: 131072 (order: 8, 1048576 bytes, vmalloc)
[    8.879327][    T0] mce: CPU0: Thermal mo[    8.996916][    T1] Performance Events: PEBS fmt3+, Skylake events, 32-deep LBR, full-width counters, Intel PMU driver.
[    8.999591][    T1] ... version:                4
[    9.004310][    T1] ... bit width:              48
[    9.009118][    T1] ... generic registers:      4
[    9.009574][    T1] ... value mask:             0000ffffffffffff
[    9.015601][    T1] ... max period:             00007fffffffffff
[    9.019574][    T1] ... fixed-purpose events:   3
[    9.024294][    T1] ... event mask:             000000070000000f
[    9.034357][    T1] rcu: Hierarchical SRCU implementation.
[    9.062516][    T5] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.

> 
> There are quite a bunch of issues to solve:
> 
>   - X86 does not use the device::msi_domain pointer for historical reasons
>     and due to XEN, which makes it impossible to create an architecture
>     agnostic device MSI infrastructure.
> 
>   - X86 has it's own msi_alloc_info data type which is pointlessly
>     different from the generic version and does not allow to share code.
> 
>   - The logic of composing MSI messages in an hierarchy is busted at the
>     core level and of course some (x86) drivers depend on that.
> 
>   - A few minor shortcomings as usual
> 
> This series addresses that in several steps:
> 
>  1) Accidental bug fixes
> 
>       iommu/amd: Prevent NULL pointer dereference
> 
>  2) Janitoring
> 
>       x86/init: Remove unused init ops
>       PCI: vmd: Dont abuse vector irqomain as parent
>       x86/msi: Remove pointless vcpu_affinity callback
> 
>  3) Sanitizing the composition of MSI messages in a hierarchy
>  
>       genirq/chip: Use the first chip in irq_chip_compose_msi_msg()
>       x86/msi: Move compose message callback where it belongs
> 
>  4) Simplification of the x86 specific interrupt allocation mechanism
> 
>       x86/irq: Rename X86_IRQ_ALLOC_TYPE_MSI* to reflect PCI dependency
>       x86/irq: Add allocation type for parent domain retrieval
>       iommu/vt-d: Consolidate irq domain getter
>       iommu/amd: Consolidate irq domain getter
>       iommu/irq_remapping: Consolidate irq domain lookup
> 
>  5) Consolidation of the X86 specific interrupt allocation mechanism to be as
> close
>     as possible to the generic MSI allocation mechanism which allows to get
> rid
>     of quite a bunch of x86'isms which are pointless
> 
>       x86/irq: Prepare consolidation of irq_alloc_info
>       x86/msi: Consolidate HPET allocation
>       x86/ioapic: Consolidate IOAPIC allocation
>       x86/irq: Consolidate DMAR irq allocation
>       x86/irq: Consolidate UV domain allocation
>       PCI/MSI: Rework pci_msi_domain_calc_hwirq()
>       x86/msi: Consolidate MSI allocation
>       x86/msi: Use generic MSI domain ops
> 
>   6) x86 specific cleanups to remove the dependency on arch_*_msi_irqs()
> 
>       x86/irq: Move apic_post_init() invocation to one place
>       x86/pci: Reducde #ifdeffery in PCI init code
>       x86/irq: Initialize PCI/MSI domain at PCI init time
>       irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI
>       PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI
>       PCI/MSI: Provide pci_dev_has_special_msi_domain() helper
>       x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init()
>       x86/xen: Rework MSI teardown
>       x86/xen: Consolidate XEN-MSI init
>       irqdomain/msi: Allow to override msi_domain_alloc/free_irqs()
>       x86/xen: Wrap XEN MSI management into irqdomain
>       iommm/vt-d: Store irq domain in struct device
>       iommm/amd: Store irq domain in struct device
>       x86/pci: Set default irq domain in pcibios_add_device()
>       PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
>       x86/irq: Cleanup the arch_*_msi_irqs() leftovers
>       x86/irq: Make most MSI ops XEN private
>       iommu/vt-d: Remove domain search for PCI/MSI[X]
>       iommu/amd: Remove domain search for PCI/MSI
> 
>   7) X86 specific preparation for device MSI
> 
>       x86/irq: Add DEV_MSI allocation type
>       x86/msi: Rename and rework pci_msi_prepare() to cover non-PCI MSI
> 
>   8) Generic device MSI infrastructure
>       platform-msi: Provide default irq_chip:: Ack
>       genirq/proc: Take buslock on affinity write
>       genirq/msi: Provide and use msi_domain_set_default_info_flags()
>       platform-msi: Add device MSI infrastructure
>       irqdomain/msi: Provide msi_alloc/free_store() callbacks
> 
>   9) POC of IMS (Interrupt Message Storm) irq domain and irqchip
>      implementations for both device array and queue storage.
> 
>       irqchip: Add IMS (Interrupt Message Storm) driver - NOT FOR MERGING
> 
> Changes vs. V1:
> 
>    - Addressed various review comments and addressed the 0day fallout.
>      - Corrected the XEN logic (Jürgen)
>      - Make the arch fallback in PCI/MSI opt-in not opt-out (Bjorn)
> 
>    - Fixed the compose MSI message inconsistency
> 
>    - Ensure that the necessary flags are set for device SMI
> 
>    - Make the irq bus logic work for affinity setting to prepare
>      support for IMS storage in queue memory. It turned out to be
>      less scary than I feared.
> 
>    - Remove leftovers in iommu/intel|amd
> 
>    - Reworked the IMS POC driver to cover queue storage so Jason can have a
>      look whether that fits the needs of MLX devices.
> 
> The whole lot is also available from git:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git device-msi
> 
> This has been tested on Intel/AMD/KVM but lacks testing on:
> 
>     - HYPERV (-ENODEV)
>     - VMD enabled systems (-ENODEV)
>     - XEN (-ENOCLUE)
>     - IMS (-ENODEV)
> 
>     - Any non-X86 code which might depend on the broken compose MSI message
>       logic. Marc excpects not much fallout, but agrees that we need to fix
>       it anyway.
> 
> #1 - #3 should be applied unconditionally for obvious reasons
> #4 - #6 are wortwhile cleanups which should be done independent of device MSI
> 
> #7 - #8 look promising to cleanup the platform MSI implementation
>      	independent of #8, but I neither had cycles nor the stomach to
>      	tackle that.
> 
> #9	is obviously just for the folks interested in IMS
> 
> Thanks,
> 
> 	tglx


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI
  2020-09-25 15:29 ` [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Qian Cai
@ 2020-09-25 15:49   ` Peter Zijlstra
  2020-09-25 23:14     ` Thomas Gleixner
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2020-09-25 15:49 UTC (permalink / raw)
  To: Qian Cai
  Cc: Thomas Gleixner, LKML, Stephen Rothwell, linux-next, x86,
	Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang, Jon Derrick,
	Lu Baolu, Wei Liu, K. Y. Srinivasan, Stephen Hemminger,
	Steve Wahl, Dimitri Sivanich, Russ Anderson, linux-pci,
	Bjorn Helgaas, Lorenzo Pieralisi, Konrad Rzeszutek Wilk,
	xen-devel, Juergen Gross, Boris Ostrovsky, Stefano Stabellini,
	Marc Zyngier, Greg Kroah-Hartman, Rafael J. Wysocki, Megha Dey,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Jacob Pan,
	Baolu Lu, Kevin Tian, Dan Williams

On Fri, Sep 25, 2020 at 11:29:13AM -0400, Qian Cai wrote:

> It looks like the crashes happen in the interrupt remapping code where they are
> only able to to generate partial call traces.

> [    8.466614][    T0] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [    8.474295][    T0] #PF: supervisor instruction fetch in kernel mode
> [    8.480669][    T0] #PF: error_code(0x0010) - not-present page
> [    8.486518][    T0] PGD 0 P4D 0 
> [    8.489757][    T0] Oops: 0010 [#1] SMP KASAN PTI
> [    8.494476][    T0] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G          I       5.9.0-rc6-next-20200925 #2
> [    8.503987][    T0] Hardware name: HPE ProLiant DL560 Gen10/ProLiant DL560 Gen10, BIOS U34 11/13/2019
> [    8.513238][    T0] RIP: 0010:0x0
> [    8.516562][    T0] Code: Bad RIP v

Here it looks like this:

[    1.830276] BUG: kernel NULL pointer dereference, address: 0000000000000000
[    1.838043] #PF: supervisor instruction fetch in kernel mode
[    1.844357] #PF: error_code(0x0010) - not-present page
[    1.850090] PGD 0 P4D 0
[    1.852915] Oops: 0010 [#1] SMP
[    1.856419] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc6-00700-g0248dedd12d4 #419
[    1.865447] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
[    1.876902] RIP: 0010:0x0
[    1.879824] Code: Bad RIP value.
[    1.883423] RSP: 0000:ffffffff82803da0 EFLAGS: 00010282
[    1.889251] RAX: 0000000000000000 RBX: ffffffff8282b980 RCX: ffffffff82803e40
[    1.897241] RDX: 0000000000000001 RSI: ffffffff82803e40 RDI: ffffffff8282b980
[    1.905201] RBP: ffff88842f331000 R08: 00000000ffffffff R09: 0000000000000001
[    1.913162] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000048
[    1.921123] R13: ffffffff82803e40 R14: ffffffff8282b9c0 R15: 0000000000000000
[    1.929085] FS:  0000000000000000(0000) GS:ffff88842f400000(0000) knlGS:0000000000000000
[    1.938113] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.944524] CR2: ffffffffffffffd6 CR3: 0000000002811001 CR4: 00000000000606b0
[    1.952484] Call Trace:
[    1.955214]  msi_domain_alloc+0x36/0x130
[    1.959594]  __irq_domain_alloc_irqs+0x165/0x380
[    1.964748]  dmar_alloc_hwirq+0x9a/0x120
[    1.969127]  dmar_set_interrupt.part.0+0x1c/0x60
[    1.974281]  enable_drhd_fault_handling+0x2c/0x6c
[    1.979532]  apic_intr_mode_init+0xfa/0x100
[    1.984191]  x86_late_time_init+0x20/0x30
[    1.988662]  start_kernel+0x723/0x7e6
[    1.992748]  secondary_startup_64_no_verify+0xa6/0xab
[    1.998386] Modules linked in:
[    2.001794] CR2: 0000000000000000
[    2.005510] ---[ end trace 837dc60d7c66efa2 ]---


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI
  2020-09-25 15:49   ` Peter Zijlstra
@ 2020-09-25 23:14     ` Thomas Gleixner
  2020-09-27  8:46       ` [PATCH] x86/apic/msi: Unbreak DMAR and HPET MSI Thomas Gleixner
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Gleixner @ 2020-09-25 23:14 UTC (permalink / raw)
  To: Peter Zijlstra, Qian Cai
  Cc: LKML, Stephen Rothwell, linux-next, x86, Joerg Roedel, iommu,
	linux-hyperv, Haiyang Zhang, Jon Derrick, Lu Baolu, Wei Liu,
	K. Y. Srinivasan, Stephen Hemminger, Steve Wahl,
	Dimitri Sivanich, Russ Anderson, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Konrad Rzeszutek Wilk, xen-devel,
	Juergen Gross, Boris Ostrovsky, Stefano Stabellini, Marc Zyngier,
	Greg Kroah-Hartman, Rafael J. Wysocki, Megha Dey,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Jacob Pan,
	Baolu Lu, Kevin Tian, Dan Williams

On Fri, Sep 25 2020 at 17:49, Peter Zijlstra wrote:
> Here it looks like this:
>
> [    1.830276] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [    1.838043] #PF: supervisor instruction fetch in kernel mode
> [    1.844357] #PF: error_code(0x0010) - not-present page
> [    1.850090] PGD 0 P4D 0
> [    1.852915] Oops: 0010 [#1] SMP
> [    1.856419] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc6-00700-g0248dedd12d4 #419
> [    1.865447] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
> [    1.876902] RIP: 0010:0x0
> [    1.879824] Code: Bad RIP value.
> [    1.883423] RSP: 0000:ffffffff82803da0 EFLAGS: 00010282
> [    1.889251] RAX: 0000000000000000 RBX: ffffffff8282b980 RCX: ffffffff82803e40
> [    1.897241] RDX: 0000000000000001 RSI: ffffffff82803e40 RDI: ffffffff8282b980
> [    1.905201] RBP: ffff88842f331000 R08: 00000000ffffffff R09: 0000000000000001
> [    1.913162] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000048
> [    1.921123] R13: ffffffff82803e40 R14: ffffffff8282b9c0 R15: 0000000000000000
> [    1.929085] FS:  0000000000000000(0000) GS:ffff88842f400000(0000) knlGS:0000000000000000
> [    1.938113] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    1.944524] CR2: ffffffffffffffd6 CR3: 0000000002811001 CR4: 00000000000606b0
> [    1.952484] Call Trace:
> [    1.955214]  msi_domain_alloc+0x36/0x130

Hrm. That looks like a not initialized mandatory callback. Confused.

Is this on -next and if so, does this happen on tip:x86/irq as well?

Can you provide yoru config please?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
  2020-09-25 13:54   ` [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable Qian Cai
@ 2020-09-26 12:38     ` Vasily Gorbik
  2020-09-28 10:11       ` Thomas Gleixner
  0 siblings, 1 reply; 7+ messages in thread
From: Vasily Gorbik @ 2020-09-26 12:38 UTC (permalink / raw)
  To: Thomas Gleixner, Qian Cai
  Cc: LKML, Heiko Carstens, Christian Borntraeger, linux-s390,
	Stephen Rothwell, linux-next, x86, Joerg Roedel, iommu,
	linux-hyperv, Haiyang Zhang, Jon Derrick, Lu Baolu, Wei Liu,
	K. Y. Srinivasan, Stephen Hemminger, Steve Wahl,
	Dimitri Sivanich, Russ Anderson, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Konrad Rzeszutek Wilk, xen-devel,
	Juergen Gross, Boris Ostrovsky, Stefano Stabellini, Marc Zyngier,
	Greg Kroah-Hartman, Rafael J. Wysocki, Megha Dey,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Jacob Pan,
	Baolu Lu, Kevin Tian, Dan Williams

On Fri, Sep 25, 2020 at 09:54:52AM -0400, Qian Cai wrote:
> On Wed, 2020-08-26 at 13:17 +0200, Thomas Gleixner wrote:
> > From: Thomas Gleixner <tglx@linutronix.de>
> > 
> > The arch_.*_msi_irq[s] fallbacks are compiled in whether an architecture
> > requires them or not. Architectures which are fully utilizing hierarchical
> > irq domains should never call into that code.
> > 
> > It's not only architectures which depend on that by implementing one or
> > more of the weak functions, there is also a bunch of drivers which relies
> > on the weak functions which invoke msi_controller::setup_irq[s] and
> > msi_controller::teardown_irq.
> > 
> > Make the architectures and drivers which rely on them select them in Kconfig
> > and if not selected replace them by stub functions which emit a warning and
> > fail the PCI/MSI interrupt allocation.
> > 
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> 
> Today's linux-next will have some warnings on s390x:
> 
> .config: https://gitlab.com/cailca/linux-mm/-/blob/master/s390.config
> 
> WARNING: unmet direct dependencies detected for PCI_MSI_ARCH_FALLBACKS
>   Depends on [n]: PCI [=n]
>   Selected by [y]:
>   - S390 [=y]
> 
> WARNING: unmet direct dependencies detected for PCI_MSI_ARCH_FALLBACKS
>   Depends on [n]: PCI [=n]
>   Selected by [y]:
>   - S390 [=y]
>

Yes, as well as on mips and sparc which also don't FORCE_PCI.
This seems to work for s390:

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index b0b7acf07eb8..41136fbe909b 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -192,3 +192,3 @@ config S390
        select PCI_MSI                  if PCI
-       select PCI_MSI_ARCH_FALLBACKS
+       select PCI_MSI_ARCH_FALLBACKS   if PCI
        select SET_FS

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH] x86/apic/msi: Unbreak DMAR and HPET MSI
  2020-09-25 23:14     ` Thomas Gleixner
@ 2020-09-27  8:46       ` Thomas Gleixner
  0 siblings, 0 replies; 7+ messages in thread
From: Thomas Gleixner @ 2020-09-27  8:46 UTC (permalink / raw)
  To: Peter Zijlstra, Qian Cai
  Cc: LKML, Stephen Rothwell, linux-next, x86, Joerg Roedel, iommu,
	linux-hyperv, Haiyang Zhang, Jon Derrick, Lu Baolu, Wei Liu,
	K. Y. Srinivasan, Stephen Hemminger, Steve Wahl,
	Dimitri Sivanich, Russ Anderson, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Konrad Rzeszutek Wilk, xen-devel,
	Juergen Gross, Boris Ostrovsky, Stefano Stabellini, Marc Zyngier,
	Greg Kroah-Hartman, Rafael J. Wysocki, Megha Dey,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Jacob Pan,
	Baolu Lu, Kevin Tian, Dan Williams

Switching the DMAR and HPET MSI code to use the generic MSI domain ops
missed to add the flag which tells the core code to update the domain
operations with the defaults. As a consequence the core code crashes
when an interrupt in one of those domains is allocated.

Add the missing flags.

Fixes: 9006c133a422 ("x86/msi: Use generic MSI domain ops")
Reported-by: Qian Cai <cai@redhat.com> 
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/apic/msi.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -309,6 +309,7 @@ static struct msi_domain_ops dmar_msi_do
 static struct msi_domain_info dmar_msi_domain_info = {
 	.ops		= &dmar_msi_domain_ops,
 	.chip		= &dmar_msi_controller,
+	.flags		= MSI_FLAG_USE_DEF_DOM_OPS,
 };
 
 static struct irq_domain *dmar_get_irq_domain(void)
@@ -408,6 +409,7 @@ static struct msi_domain_ops hpet_msi_do
 static struct msi_domain_info hpet_msi_domain_info = {
 	.ops		= &hpet_msi_domain_ops,
 	.chip		= &hpet_msi_controller,
+	.flags		= MSI_FLAG_USE_DEF_DOM_OPS,
 };
 
 struct irq_domain *hpet_create_irq_domain(int hpet_id)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
  2020-09-26 12:38     ` Vasily Gorbik
@ 2020-09-28 10:11       ` Thomas Gleixner
  0 siblings, 0 replies; 7+ messages in thread
From: Thomas Gleixner @ 2020-09-28 10:11 UTC (permalink / raw)
  To: Vasily Gorbik, Qian Cai
  Cc: LKML, Heiko Carstens, Christian Borntraeger, linux-s390,
	Stephen Rothwell, linux-next, x86, Joerg Roedel, iommu,
	linux-hyperv, Haiyang Zhang, Jon Derrick, Lu Baolu, Wei Liu,
	K. Y. Srinivasan, Stephen Hemminger, Steve Wahl,
	Dimitri Sivanich, Russ Anderson, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Konrad Rzeszutek Wilk, xen-devel,
	Juergen Gross, Boris Ostrovsky, Stefano Stabellini, Marc Zyngier,
	Greg Kroah-Hartman, Rafael J. Wysocki, Megha Dey,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Jacob Pan,
	Baolu Lu, Kevin Tian, Dan Williams

On Sat, Sep 26 2020 at 14:38, Vasily Gorbik wrote:
> On Fri, Sep 25, 2020 at 09:54:52AM -0400, Qian Cai wrote:
> Yes, as well as on mips and sparc which also don't FORCE_PCI.
> This seems to work for s390:
>
> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> index b0b7acf07eb8..41136fbe909b 100644
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -192,3 +192,3 @@ config S390
>         select PCI_MSI                  if PCI
> -       select PCI_MSI_ARCH_FALLBACKS
> +       select PCI_MSI_ARCH_FALLBACKS   if PCI
>         select SET_FS

lemme fix that for all of them ...

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-09-28 10:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200826111628.794979401@linutronix.de>
     [not found] ` <20200826112333.992429909@linutronix.de>
2020-09-25 13:54   ` [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable Qian Cai
2020-09-26 12:38     ` Vasily Gorbik
2020-09-28 10:11       ` Thomas Gleixner
2020-09-25 15:29 ` [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Qian Cai
2020-09-25 15:49   ` Peter Zijlstra
2020-09-25 23:14     ` Thomas Gleixner
2020-09-27  8:46       ` [PATCH] x86/apic/msi: Unbreak DMAR and HPET MSI Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).