linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI
@ 2020-08-26 11:16 Thomas Gleixner
  2020-08-26 11:16 ` [patch V2 01/46] iommu/amd: Prevent NULL pointer dereference Thomas Gleixner
                   ` (53 more replies)
  0 siblings, 54 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

This is the second version of providing a base to support device MSI (non
PCI based) and on top of that support for IMS (Interrupt Message Storm)
based devices in a halfways architecture independent way.

The first version can be found here:

    https://lore.kernel.org/r/20200821002424.119492231@linutronix.de

It's still a mixed bag of bug fixes, cleanups and general improvements
which are worthwhile independent of device MSI.

There are quite a bunch of issues to solve:

  - X86 does not use the device::msi_domain pointer for historical reasons
    and due to XEN, which makes it impossible to create an architecture
    agnostic device MSI infrastructure.

  - X86 has it's own msi_alloc_info data type which is pointlessly
    different from the generic version and does not allow to share code.

  - The logic of composing MSI messages in an hierarchy is busted at the
    core level and of course some (x86) drivers depend on that.

  - A few minor shortcomings as usual

This series addresses that in several steps:

 1) Accidental bug fixes

      iommu/amd: Prevent NULL pointer dereference

 2) Janitoring

      x86/init: Remove unused init ops
      PCI: vmd: Dont abuse vector irqomain as parent
      x86/msi: Remove pointless vcpu_affinity callback

 3) Sanitizing the composition of MSI messages in a hierarchy
 
      genirq/chip: Use the first chip in irq_chip_compose_msi_msg()
      x86/msi: Move compose message callback where it belongs

 4) Simplification of the x86 specific interrupt allocation mechanism

      x86/irq: Rename X86_IRQ_ALLOC_TYPE_MSI* to reflect PCI dependency
      x86/irq: Add allocation type for parent domain retrieval
      iommu/vt-d: Consolidate irq domain getter
      iommu/amd: Consolidate irq domain getter
      iommu/irq_remapping: Consolidate irq domain lookup

 5) Consolidation of the X86 specific interrupt allocation mechanism to be as close
    as possible to the generic MSI allocation mechanism which allows to get rid
    of quite a bunch of x86'isms which are pointless

      x86/irq: Prepare consolidation of irq_alloc_info
      x86/msi: Consolidate HPET allocation
      x86/ioapic: Consolidate IOAPIC allocation
      x86/irq: Consolidate DMAR irq allocation
      x86/irq: Consolidate UV domain allocation
      PCI/MSI: Rework pci_msi_domain_calc_hwirq()
      x86/msi: Consolidate MSI allocation
      x86/msi: Use generic MSI domain ops

  6) x86 specific cleanups to remove the dependency on arch_*_msi_irqs()

      x86/irq: Move apic_post_init() invocation to one place
      x86/pci: Reducde #ifdeffery in PCI init code
      x86/irq: Initialize PCI/MSI domain at PCI init time
      irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI
      PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI
      PCI/MSI: Provide pci_dev_has_special_msi_domain() helper
      x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init()
      x86/xen: Rework MSI teardown
      x86/xen: Consolidate XEN-MSI init
      irqdomain/msi: Allow to override msi_domain_alloc/free_irqs()
      x86/xen: Wrap XEN MSI management into irqdomain
      iommm/vt-d: Store irq domain in struct device
      iommm/amd: Store irq domain in struct device
      x86/pci: Set default irq domain in pcibios_add_device()
      PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
      x86/irq: Cleanup the arch_*_msi_irqs() leftovers
      x86/irq: Make most MSI ops XEN private
      iommu/vt-d: Remove domain search for PCI/MSI[X]
      iommu/amd: Remove domain search for PCI/MSI

  7) X86 specific preparation for device MSI

      x86/irq: Add DEV_MSI allocation type
      x86/msi: Rename and rework pci_msi_prepare() to cover non-PCI MSI

  8) Generic device MSI infrastructure
      platform-msi: Provide default irq_chip:: Ack
      genirq/proc: Take buslock on affinity write
      genirq/msi: Provide and use msi_domain_set_default_info_flags()
      platform-msi: Add device MSI infrastructure
      irqdomain/msi: Provide msi_alloc/free_store() callbacks

  9) POC of IMS (Interrupt Message Storm) irq domain and irqchip
     implementations for both device array and queue storage.

      irqchip: Add IMS (Interrupt Message Storm) driver - NOT FOR MERGING

Changes vs. V1:

   - Addressed various review comments and addressed the 0day fallout.
     - Corrected the XEN logic (Jürgen)
     - Make the arch fallback in PCI/MSI opt-in not opt-out (Bjorn)

   - Fixed the compose MSI message inconsistency

   - Ensure that the necessary flags are set for device SMI

   - Make the irq bus logic work for affinity setting to prepare
     support for IMS storage in queue memory. It turned out to be
     less scary than I feared.

   - Remove leftovers in iommu/intel|amd

   - Reworked the IMS POC driver to cover queue storage so Jason can have a
     look whether that fits the needs of MLX devices.

The whole lot is also available from git:

   git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git device-msi

This has been tested on Intel/AMD/KVM but lacks testing on:

    - HYPERV (-ENODEV)
    - VMD enabled systems (-ENODEV)
    - XEN (-ENOCLUE)
    - IMS (-ENODEV)

    - Any non-X86 code which might depend on the broken compose MSI message
      logic. Marc excpects not much fallout, but agrees that we need to fix
      it anyway.

#1 - #3 should be applied unconditionally for obvious reasons
#4 - #6 are wortwhile cleanups which should be done independent of device MSI

#7 - #8 look promising to cleanup the platform MSI implementation
     	independent of #8, but I neither had cycles nor the stomach to
     	tackle that.

#9	is obviously just for the folks interested in IMS

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 01/46] iommu/amd: Prevent NULL pointer dereference
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-27 14:57   ` Joerg Roedel
  2020-08-26 11:16 ` [patch V2 02/46] x86/init: Remove unused init ops Thomas Gleixner
                   ` (52 subsequent siblings)
  53 siblings, 1 reply; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

Dereferencing irq_data before checking it for NULL is suboptimal.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 drivers/iommu/amd/iommu.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3717,8 +3717,8 @@ static int irq_remapping_alloc(struct ir
 
 	for (i = 0; i < nr_irqs; i++) {
 		irq_data = irq_domain_get_irq_data(domain, virq + i);
-		cfg = irqd_cfg(irq_data);
-		if (!irq_data || !cfg) {
+		cfg = irq_data ? irqd_cfg(irq_data) : NULL;
+		if (!cfg) {
 			ret = -EINVAL;
 			goto out_free_data;
 		}



^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 02/46] x86/init: Remove unused init ops
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
  2020-08-26 11:16 ` [patch V2 01/46] iommu/amd: Prevent NULL pointer dereference Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-26 11:16 ` [patch V2 03/46] PCI: vmd: Dont abuse vector irqomain as parent Thomas Gleixner
                   ` (51 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

Some past platform removal forgot to get rid of this unused ballast.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/mpspec.h   |   10 ----------
 arch/x86/include/asm/x86_init.h |   10 ----------
 arch/x86/kernel/mpparse.c       |   26 ++++----------------------
 arch/x86/kernel/x86_init.c      |    4 ----
 4 files changed, 4 insertions(+), 46 deletions(-)

--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -67,21 +67,11 @@ static inline void find_smp_config(void)
 #ifdef CONFIG_X86_MPPARSE
 extern void e820__memblock_alloc_reserved_mpc_new(void);
 extern int enable_update_mptable;
-extern int default_mpc_apic_id(struct mpc_cpu *m);
-extern void default_smp_read_mpc_oem(struct mpc_table *mpc);
-# ifdef CONFIG_X86_IO_APIC
-extern void default_mpc_oem_bus_info(struct mpc_bus *m, char *str);
-# else
-#  define default_mpc_oem_bus_info NULL
-# endif
 extern void default_find_smp_config(void);
 extern void default_get_smp_config(unsigned int early);
 #else
 static inline void e820__memblock_alloc_reserved_mpc_new(void) { }
 #define enable_update_mptable 0
-#define default_mpc_apic_id NULL
-#define default_smp_read_mpc_oem NULL
-#define default_mpc_oem_bus_info NULL
 #define default_find_smp_config x86_init_noop
 #define default_get_smp_config x86_init_uint_noop
 #endif
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -11,22 +11,12 @@ struct cpuinfo_x86;
 
 /**
  * struct x86_init_mpparse - platform specific mpparse ops
- * @mpc_record:			platform specific mpc record accounting
  * @setup_ioapic_ids:		platform specific ioapic id override
- * @mpc_apic_id:		platform specific mpc apic id assignment
- * @smp_read_mpc_oem:		platform specific oem mpc table setup
- * @mpc_oem_pci_bus:		platform specific pci bus setup (default NULL)
- * @mpc_oem_bus_info:		platform specific mpc bus info
  * @find_smp_config:		find the smp configuration
  * @get_smp_config:		get the smp configuration
  */
 struct x86_init_mpparse {
-	void (*mpc_record)(unsigned int mode);
 	void (*setup_ioapic_ids)(void);
-	int (*mpc_apic_id)(struct mpc_cpu *m);
-	void (*smp_read_mpc_oem)(struct mpc_table *mpc);
-	void (*mpc_oem_pci_bus)(struct mpc_bus *m);
-	void (*mpc_oem_bus_info)(struct mpc_bus *m, char *name);
 	void (*find_smp_config)(void);
 	void (*get_smp_config)(unsigned int early);
 };
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -46,11 +46,6 @@ static int __init mpf_checksum(unsigned
 	return sum & 0xFF;
 }
 
-int __init default_mpc_apic_id(struct mpc_cpu *m)
-{
-	return m->apicid;
-}
-
 static void __init MP_processor_info(struct mpc_cpu *m)
 {
 	int apicid;
@@ -61,7 +56,7 @@ static void __init MP_processor_info(str
 		return;
 	}
 
-	apicid = x86_init.mpparse.mpc_apic_id(m);
+	apicid = m->apicid;
 
 	if (m->cpuflag & CPU_BOOTPROCESSOR) {
 		bootup_cpu = " (Bootup-CPU)";
@@ -73,7 +68,7 @@ static void __init MP_processor_info(str
 }
 
 #ifdef CONFIG_X86_IO_APIC
-void __init default_mpc_oem_bus_info(struct mpc_bus *m, char *str)
+static void __init mpc_oem_bus_info(struct mpc_bus *m, char *str)
 {
 	memcpy(str, m->bustype, 6);
 	str[6] = 0;
@@ -84,7 +79,7 @@ static void __init MP_bus_info(struct mp
 {
 	char str[7];
 
-	x86_init.mpparse.mpc_oem_bus_info(m, str);
+	mpc_oem_bus_info(m, str);
 
 #if MAX_MP_BUSSES < 256
 	if (m->busid >= MAX_MP_BUSSES) {
@@ -100,9 +95,6 @@ static void __init MP_bus_info(struct mp
 		mp_bus_id_to_type[m->busid] = MP_BUS_ISA;
 #endif
 	} else if (strncmp(str, BUSTYPE_PCI, sizeof(BUSTYPE_PCI) - 1) == 0) {
-		if (x86_init.mpparse.mpc_oem_pci_bus)
-			x86_init.mpparse.mpc_oem_pci_bus(m);
-
 		clear_bit(m->busid, mp_bus_not_pci);
 #ifdef CONFIG_EISA
 		mp_bus_id_to_type[m->busid] = MP_BUS_PCI;
@@ -198,8 +190,6 @@ static void __init smp_dump_mptable(stru
 			1, mpc, mpc->length, 1);
 }
 
-void __init default_smp_read_mpc_oem(struct mpc_table *mpc) { }
-
 static int __init smp_read_mpc(struct mpc_table *mpc, unsigned early)
 {
 	char str[16];
@@ -218,14 +208,7 @@ static int __init smp_read_mpc(struct mp
 	if (early)
 		return 1;
 
-	if (mpc->oemptr)
-		x86_init.mpparse.smp_read_mpc_oem(mpc);
-
-	/*
-	 *      Now process the configuration blocks.
-	 */
-	x86_init.mpparse.mpc_record(0);
-
+	/* Now process the configuration blocks. */
 	while (count < mpc->length) {
 		switch (*mpt) {
 		case MP_PROCESSOR:
@@ -256,7 +239,6 @@ static int __init smp_read_mpc(struct mp
 			count = mpc->length;
 			break;
 		}
-		x86_init.mpparse.mpc_record(1);
 	}
 
 	if (!num_processors)
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -67,11 +67,7 @@ struct x86_init_ops x86_init __initdata
 	},
 
 	.mpparse = {
-		.mpc_record		= x86_init_uint_noop,
 		.setup_ioapic_ids	= x86_init_noop,
-		.mpc_apic_id		= default_mpc_apic_id,
-		.smp_read_mpc_oem	= default_smp_read_mpc_oem,
-		.mpc_oem_bus_info	= default_mpc_oem_bus_info,
 		.find_smp_config	= default_find_smp_config,
 		.get_smp_config		= default_get_smp_config,
 	},



^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 03/46] PCI: vmd: Dont abuse vector irqomain as parent
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
  2020-08-26 11:16 ` [patch V2 01/46] iommu/amd: Prevent NULL pointer dereference Thomas Gleixner
  2020-08-26 11:16 ` [patch V2 02/46] x86/init: Remove unused init ops Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-26 11:16 ` [patch V2 04/46] genirq/chip: Use the first chip in irq_chip_compose_msi_msg() Thomas Gleixner
                   ` (50 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

VMD has it's own PCI/MSI interrupt domain which is not in any way depending
on the x86 vector domain. PCI devices behind VMD share the VMD MSIX vector
entries via a VMD specific message translation to the actual VMD MSIX
vector. The VMD device interrupt handler for the VMD MSIX vectors invokes
all interrupt handlers of the devices which share a vector.

Making the x86 vector domain the actual parent of the VMD irq domain is
pointless and actually counterproductive. When a device interrupt is
requested then it will activate the interrupt which traverses down the
hierarchy and consumes an interrupt vector in the vector domain which is
never used.

The domain is self contained and has no parent dependencies, so just hand
in NULL for the parent and be done with it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jonathan Derrick <jonathan.derrick@intel.com>
Cc: linux-pci@vger.kernel.org
---
V2: New patch.
---
 drivers/pci/controller/vmd.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -573,7 +573,8 @@ static int vmd_enable_domain(struct vmd_
 		return -ENODEV;
 
 	vmd->irq_domain = pci_msi_create_irq_domain(fn, &vmd_msi_domain_info,
-						    x86_vector_domain);
+						    NULL);
+
 	if (!vmd->irq_domain) {
 		irq_domain_free_fwnode(fn);
 		return -ENODEV;


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 04/46] genirq/chip: Use the first chip in irq_chip_compose_msi_msg()
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (2 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 03/46] PCI: vmd: Dont abuse vector irqomain as parent Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-26 19:50   ` Marc Zyngier
  2020-08-26 11:16 ` [patch V2 05/46] x86/msi: Move compose message callback where it belongs Thomas Gleixner
                   ` (49 subsequent siblings)
  53 siblings, 1 reply; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

The documentation of irq_chip_compose_msi_msg() claims that with
hierarchical irq domains the first chip in the hierarchy which has an
irq_compose_msi_msg() callback is chosen. But the code just keeps
iterating after it finds a chip with a compose callback.

The x86 HPET MSI implementation relies on that behaviour, but that does not
make it more correct.

The message should always be composed at the domain which manages the
underlying resource (e.g. APIC or remap table) because that domain knows
about the required layout of the message.

On X86 the following hierarchies exist:

1)   vector -------- PCI/MSI
2)   vector -- IR -- PCI/MSI

The vector domain has a different message format than the IR (remapping)
domain. So obviously the PCI/MSI domain can't compose the message without
having knowledge about the parent domain, which is exactly the opposite of
what hierarchical domains want to achieve.

X86 actually has two different PCI/MSI chips where #1 has a compose
callback and #2 does not. #2 delegates the composition to the remap domain
where it belongs, but #1 does it at the PCI/MSI level.

For the upcoming device MSI support it's necessary to change this and just
let the first domain which can compose the message take care of it. That
way the top level chip does not have to worry about it and the device MSI
code does not need special knowledge about topologies. It just sets the
compose callback to NULL and lets the hierarchy pick the first chip which
has one.

Due to that the attempt to move the compose callback from the direct
delivery PCI/MSI domain to the vector domain made the system fail to boot
with interrupt remapping enabled because in the remapping case
irq_chip_compose_msi_msg() keeps iterating and choses the compose callback
of the vector domain which obviously creates the wrong format for the remap
table.

Break out of the loop when the first irq chip with a compose callback is
found and fixup the HPET code temporarily. That workaround will be removed
once the direct delivery compose callback is moved to the place where it
belongs in the vector domain.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch. Note, that this might break other stuff which relies on the
    current behaviour, but the hierarchy composition of DT based chips is
    really hard to follow.
---
 arch/x86/kernel/apic/msi.c |    7 +++++--
 kernel/irq/chip.c          |   12 +++++++++---
 2 files changed, 14 insertions(+), 5 deletions(-)

--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -479,10 +479,13 @@ struct irq_domain *hpet_create_irq_domai
 	info.type = X86_IRQ_ALLOC_TYPE_HPET;
 	info.hpet_id = hpet_id;
 	parent = irq_remapping_get_ir_irq_domain(&info);
-	if (parent == NULL)
+	if (parent == NULL) {
 		parent = x86_vector_domain;
-	else
+	} else {
 		hpet_msi_controller.name = "IR-HPET-MSI";
+		/* Temporary fix: Will go away */
+		hpet_msi_controller.irq_compose_msi_msg = NULL;
+	}
 
 	fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
 					      hpet_id);
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -1544,10 +1544,16 @@ int irq_chip_compose_msi_msg(struct irq_
 	struct irq_data *pos = NULL;
 
 #ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
-	for (; data; data = data->parent_data)
-#endif
-		if (data->chip && data->chip->irq_compose_msi_msg)
+	for (; data; data = data->parent_data) {
+		if (data->chip && data->chip->irq_compose_msi_msg) {
 			pos = data;
+			break;
+		}
+	}
+#else
+	if (data->chip && data->chip->irq_compose_msi_msg)
+		pos = data;
+#endif
 	if (!pos)
 		return -ENOSYS;
 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 05/46] x86/msi: Move compose message callback where it belongs
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (3 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 04/46] genirq/chip: Use the first chip in irq_chip_compose_msi_msg() Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-26 11:16 ` [patch V2 06/46] x86/msi: Remove pointless vcpu_affinity callback Thomas Gleixner
                   ` (48 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

Composing the MSI message at the MSI chip level is wrong because the
underlying parent domain is the one which knows how the message should be
composed for the direct vector delivery or the interrupt remapping table
entry.

The interrupt remapping aware PCI/MSI chip does that already. Make the
direct delivery chip do the same and move the composition of the direct
delivery MSI message to the vector domain irq chip.

This prepares for the upcoming device MSI support to avoid having
architecture specific knowledge in the device MSI domain irq chips.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch
---
 arch/x86/include/asm/apic.h   |    8 ++++++++
 arch/x86/kernel/apic/msi.c    |   12 +++---------
 arch/x86/kernel/apic/vector.c |    1 +
 3 files changed, 12 insertions(+), 9 deletions(-)

--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -519,6 +519,14 @@ static inline bool apic_id_is_primary_th
 static inline void apic_smt_update(void) { }
 #endif
 
+struct msi_msg;
+
+#ifdef CONFIG_PCI_MSI
+void x86_vector_msi_compose_msg(struct irq_data *data, struct msi_msg *msg);
+#else
+# define x86_vector_msi_compose_msg NULL
+#endif
+
 extern void ioapic_zap_locks(void);
 
 #endif /* _ASM_X86_APIC_H */
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -45,7 +45,7 @@ static void __irq_msi_compose_msg(struct
 		MSI_DATA_VECTOR(cfg->vector);
 }
 
-static void irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
+void x86_vector_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
 {
 	__irq_msi_compose_msg(irqd_cfg(data), msg);
 }
@@ -177,7 +177,6 @@ static struct irq_chip pci_msi_controlle
 	.irq_mask		= pci_msi_mask_irq,
 	.irq_ack		= irq_chip_ack_parent,
 	.irq_retrigger		= irq_chip_retrigger_hierarchy,
-	.irq_compose_msi_msg	= irq_msi_compose_msg,
 	.irq_set_affinity	= msi_set_affinity,
 	.flags			= IRQCHIP_SKIP_SET_WAKE,
 };
@@ -321,7 +320,6 @@ static struct irq_chip dmar_msi_controll
 	.irq_ack		= irq_chip_ack_parent,
 	.irq_set_affinity	= msi_domain_set_affinity,
 	.irq_retrigger		= irq_chip_retrigger_hierarchy,
-	.irq_compose_msi_msg	= irq_msi_compose_msg,
 	.irq_write_msi_msg	= dmar_msi_write_msg,
 	.flags			= IRQCHIP_SKIP_SET_WAKE,
 };
@@ -419,7 +417,6 @@ static struct irq_chip hpet_msi_controll
 	.irq_ack = irq_chip_ack_parent,
 	.irq_set_affinity = msi_domain_set_affinity,
 	.irq_retrigger = irq_chip_retrigger_hierarchy,
-	.irq_compose_msi_msg = irq_msi_compose_msg,
 	.irq_write_msi_msg = hpet_msi_write_msg,
 	.flags = IRQCHIP_SKIP_SET_WAKE,
 };
@@ -479,13 +476,10 @@ struct irq_domain *hpet_create_irq_domai
 	info.type = X86_IRQ_ALLOC_TYPE_HPET;
 	info.hpet_id = hpet_id;
 	parent = irq_remapping_get_ir_irq_domain(&info);
-	if (parent == NULL) {
+	if (parent == NULL)
 		parent = x86_vector_domain;
-	} else {
+	else
 		hpet_msi_controller.name = "IR-HPET-MSI";
-		/* Temporary fix: Will go away */
-		hpet_msi_controller.irq_compose_msi_msg = NULL;
-	}
 
 	fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
 					      hpet_id);
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -823,6 +823,7 @@ static struct irq_chip lapic_controller
 	.name			= "APIC",
 	.irq_ack		= apic_ack_edge,
 	.irq_set_affinity	= apic_set_affinity,
+	.irq_compose_msi_msg	= x86_vector_msi_compose_msg,
 	.irq_retrigger		= apic_retrigger_irq,
 };
 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 06/46] x86/msi: Remove pointless vcpu_affinity callback
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (4 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 05/46] x86/msi: Move compose message callback where it belongs Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-26 11:16 ` [patch V2 07/46] x86/irq: Rename X86_IRQ_ALLOC_TYPE_MSI* to reflect PCI dependency Thomas Gleixner
                   ` (47 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

Setting the irq_set_vcpu_affinity() callback to
irq_chip_set_vcpu_affinity_parent() is a pointless exercise because the
function which utilizes it searchs the domain hierarchy to find a parent
domain which has such a callback.

Remove the useless indirection.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch. The same is probably true for lots of other irq chips.
---
 arch/x86/kernel/apic/msi.c |    1 -
 1 file changed, 1 deletion(-)

--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -278,7 +278,6 @@ static struct irq_chip pci_msi_ir_contro
 	.irq_mask		= pci_msi_mask_irq,
 	.irq_ack		= irq_chip_ack_parent,
 	.irq_retrigger		= irq_chip_retrigger_hierarchy,
-	.irq_set_vcpu_affinity	= irq_chip_set_vcpu_affinity_parent,
 	.flags			= IRQCHIP_SKIP_SET_WAKE,
 };
 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 07/46] x86/irq: Rename X86_IRQ_ALLOC_TYPE_MSI* to reflect PCI dependency
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (5 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 06/46] x86/msi: Remove pointless vcpu_affinity callback Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-26 11:16 ` [patch V2 08/46] x86/irq: Add allocation type for parent domain retrieval Thomas Gleixner
                   ` (46 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/hw_irq.h       |    4 ++--
 arch/x86/kernel/apic/msi.c          |    6 +++---
 drivers/iommu/amd/iommu.c           |   24 ++++++++++++------------
 drivers/iommu/intel/irq_remapping.c |   18 +++++++++---------
 4 files changed, 26 insertions(+), 26 deletions(-)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -36,8 +36,8 @@ struct msi_desc;
 enum irq_alloc_type {
 	X86_IRQ_ALLOC_TYPE_IOAPIC = 1,
 	X86_IRQ_ALLOC_TYPE_HPET,
-	X86_IRQ_ALLOC_TYPE_MSI,
-	X86_IRQ_ALLOC_TYPE_MSIX,
+	X86_IRQ_ALLOC_TYPE_PCI_MSI,
+	X86_IRQ_ALLOC_TYPE_PCI_MSIX,
 	X86_IRQ_ALLOC_TYPE_DMAR,
 	X86_IRQ_ALLOC_TYPE_UV,
 };
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -188,7 +188,7 @@ int native_setup_msi_irqs(struct pci_dev
 	struct irq_alloc_info info;
 
 	init_irq_alloc_info(&info, NULL);
-	info.type = X86_IRQ_ALLOC_TYPE_MSI;
+	info.type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
 	info.msi_dev = dev;
 
 	domain = irq_remapping_get_irq_domain(&info);
@@ -220,9 +220,9 @@ int pci_msi_prepare(struct irq_domain *d
 	init_irq_alloc_info(arg, NULL);
 	arg->msi_dev = pdev;
 	if (desc->msi_attrib.is_msix) {
-		arg->type = X86_IRQ_ALLOC_TYPE_MSIX;
+		arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX;
 	} else {
-		arg->type = X86_IRQ_ALLOC_TYPE_MSI;
+		arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
 		arg->flags |= X86_IRQ_ALLOC_CONTIGUOUS_VECTORS;
 	}
 
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3514,8 +3514,8 @@ static int get_devid(struct irq_alloc_in
 	case X86_IRQ_ALLOC_TYPE_HPET:
 		devid     = get_hpet_devid(info->hpet_id);
 		break;
-	case X86_IRQ_ALLOC_TYPE_MSI:
-	case X86_IRQ_ALLOC_TYPE_MSIX:
+	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
+	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
 		devid = get_device_id(&info->msi_dev->dev);
 		break;
 	default:
@@ -3553,8 +3553,8 @@ static struct irq_domain *get_irq_domain
 		return NULL;
 
 	switch (info->type) {
-	case X86_IRQ_ALLOC_TYPE_MSI:
-	case X86_IRQ_ALLOC_TYPE_MSIX:
+	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
+	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
 		devid = get_device_id(&info->msi_dev->dev);
 		if (devid < 0)
 			return NULL;
@@ -3615,8 +3615,8 @@ static void irq_remapping_prepare_irte(s
 		break;
 
 	case X86_IRQ_ALLOC_TYPE_HPET:
-	case X86_IRQ_ALLOC_TYPE_MSI:
-	case X86_IRQ_ALLOC_TYPE_MSIX:
+	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
+	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
 		msg->address_hi = MSI_ADDR_BASE_HI;
 		msg->address_lo = MSI_ADDR_BASE_LO;
 		msg->data = irte_info->index;
@@ -3660,15 +3660,15 @@ static int irq_remapping_alloc(struct ir
 
 	if (!info)
 		return -EINVAL;
-	if (nr_irqs > 1 && info->type != X86_IRQ_ALLOC_TYPE_MSI &&
-	    info->type != X86_IRQ_ALLOC_TYPE_MSIX)
+	if (nr_irqs > 1 && info->type != X86_IRQ_ALLOC_TYPE_PCI_MSI &&
+	    info->type != X86_IRQ_ALLOC_TYPE_PCI_MSIX)
 		return -EINVAL;
 
 	/*
 	 * With IRQ remapping enabled, don't need contiguous CPU vectors
 	 * to support multiple MSI interrupts.
 	 */
-	if (info->type == X86_IRQ_ALLOC_TYPE_MSI)
+	if (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI)
 		info->flags &= ~X86_IRQ_ALLOC_CONTIGUOUS_VECTORS;
 
 	devid = get_devid(info);
@@ -3700,9 +3700,9 @@ static int irq_remapping_alloc(struct ir
 		} else {
 			index = -ENOMEM;
 		}
-	} else if (info->type == X86_IRQ_ALLOC_TYPE_MSI ||
-		   info->type == X86_IRQ_ALLOC_TYPE_MSIX) {
-		bool align = (info->type == X86_IRQ_ALLOC_TYPE_MSI);
+	} else if (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI ||
+		   info->type == X86_IRQ_ALLOC_TYPE_PCI_MSIX) {
+		bool align = (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI);
 
 		index = alloc_irq_index(devid, nr_irqs, align, info->msi_dev);
 	} else {
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1115,8 +1115,8 @@ static struct irq_domain *intel_get_ir_i
 	case X86_IRQ_ALLOC_TYPE_HPET:
 		iommu = map_hpet_to_ir(info->hpet_id);
 		break;
-	case X86_IRQ_ALLOC_TYPE_MSI:
-	case X86_IRQ_ALLOC_TYPE_MSIX:
+	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
+	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
 		iommu = map_dev_to_ir(info->msi_dev);
 		break;
 	default:
@@ -1135,8 +1135,8 @@ static struct irq_domain *intel_get_irq_
 		return NULL;
 
 	switch (info->type) {
-	case X86_IRQ_ALLOC_TYPE_MSI:
-	case X86_IRQ_ALLOC_TYPE_MSIX:
+	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
+	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
 		iommu = map_dev_to_ir(info->msi_dev);
 		if (iommu)
 			return iommu->ir_msi_domain;
@@ -1306,8 +1306,8 @@ static void intel_irq_remapping_prepare_
 		break;
 
 	case X86_IRQ_ALLOC_TYPE_HPET:
-	case X86_IRQ_ALLOC_TYPE_MSI:
-	case X86_IRQ_ALLOC_TYPE_MSIX:
+	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
+	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
 		if (info->type == X86_IRQ_ALLOC_TYPE_HPET)
 			set_hpet_sid(irte, info->hpet_id);
 		else
@@ -1362,15 +1362,15 @@ static int intel_irq_remapping_alloc(str
 
 	if (!info || !iommu)
 		return -EINVAL;
-	if (nr_irqs > 1 && info->type != X86_IRQ_ALLOC_TYPE_MSI &&
-	    info->type != X86_IRQ_ALLOC_TYPE_MSIX)
+	if (nr_irqs > 1 && info->type != X86_IRQ_ALLOC_TYPE_PCI_MSI &&
+	    info->type != X86_IRQ_ALLOC_TYPE_PCI_MSIX)
 		return -EINVAL;
 
 	/*
 	 * With IRQ remapping enabled, don't need contiguous CPU vectors
 	 * to support multiple MSI interrupts.
 	 */
-	if (info->type == X86_IRQ_ALLOC_TYPE_MSI)
+	if (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI)
 		info->flags &= ~X86_IRQ_ALLOC_CONTIGUOUS_VECTORS;
 
 	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);



^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 08/46] x86/irq: Add allocation type for parent domain retrieval
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (6 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 07/46] x86/irq: Rename X86_IRQ_ALLOC_TYPE_MSI* to reflect PCI dependency Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-26 11:16 ` [patch V2 09/46] iommu/vt-d: Consolidate irq domain getter Thomas Gleixner
                   ` (45 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

irq_remapping_ir_irq_domain() is used to retrieve the remapping parent
domain for an allocation type. irq_remapping_irq_domain() is for retrieving
the actual device domain for allocating interrupts for a device.

The two functions are similar and can be unified by using explicit modes
for parent irq domain retrieval.

Add X86_IRQ_ALLOC_TYPE_IOAPIC/HPET_GET_PARENT and use it in the iommu
implementations. Drop the parent domain retrieval for PCI_MSI/X as that is
unused.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/hw_irq.h       |    2 ++
 arch/x86/kernel/apic/io_apic.c      |    2 +-
 arch/x86/kernel/apic/msi.c          |    2 +-
 drivers/iommu/amd/iommu.c           |    8 ++++++++
 drivers/iommu/hyperv-iommu.c        |    2 +-
 drivers/iommu/intel/irq_remapping.c |    8 ++------
 6 files changed, 15 insertions(+), 9 deletions(-)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -40,6 +40,8 @@ enum irq_alloc_type {
 	X86_IRQ_ALLOC_TYPE_PCI_MSIX,
 	X86_IRQ_ALLOC_TYPE_DMAR,
 	X86_IRQ_ALLOC_TYPE_UV,
+	X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT,
+	X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT,
 };
 
 struct irq_alloc_info {
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2296,7 +2296,7 @@ static int mp_irqdomain_create(int ioapi
 		return 0;
 
 	init_irq_alloc_info(&info, NULL);
-	info.type = X86_IRQ_ALLOC_TYPE_IOAPIC;
+	info.type = X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT;
 	info.ioapic_id = mpc_ioapic_id(ioapic);
 	parent = irq_remapping_get_ir_irq_domain(&info);
 	if (!parent)
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -476,7 +476,7 @@ struct irq_domain *hpet_create_irq_domai
 	domain_info->data = (void *)(long)hpet_id;
 
 	init_irq_alloc_info(&info, NULL);
-	info.type = X86_IRQ_ALLOC_TYPE_HPET;
+	info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
 	info.hpet_id = hpet_id;
 	parent = irq_remapping_get_ir_irq_domain(&info);
 	if (parent == NULL)
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3534,6 +3534,14 @@ static struct irq_domain *get_ir_irq_dom
 	if (!info)
 		return NULL;
 
+	switch (info->type) {
+	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
+	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
+		break;
+	default:
+		return NULL;
+	}
+
 	devid = get_devid(info);
 	if (devid >= 0) {
 		iommu = amd_iommu_rlookup_table[devid];
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -184,7 +184,7 @@ static int __init hyperv_enable_irq_rema
 
 static struct irq_domain *hyperv_get_ir_irq_domain(struct irq_alloc_info *info)
 {
-	if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC)
+	if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT)
 		return ioapic_ir_domain;
 	else
 		return NULL;
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1109,16 +1109,12 @@ static struct irq_domain *intel_get_ir_i
 		return NULL;
 
 	switch (info->type) {
-	case X86_IRQ_ALLOC_TYPE_IOAPIC:
+	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
 		iommu = map_ioapic_to_ir(info->ioapic_id);
 		break;
-	case X86_IRQ_ALLOC_TYPE_HPET:
+	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
 		iommu = map_hpet_to_ir(info->hpet_id);
 		break;
-	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
-	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-		iommu = map_dev_to_ir(info->msi_dev);
-		break;
 	default:
 		BUG_ON(1);
 		break;



^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 09/46] iommu/vt-d: Consolidate irq domain getter
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (7 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 08/46] x86/irq: Add allocation type for parent domain retrieval Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-26 11:16 ` [patch V2 10/46] iommu/amd: " Thomas Gleixner
                   ` (44 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

The irq domain request mode is now indicated in irq_alloc_info::type.

Consolidate the two getter functions into one.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 drivers/iommu/intel/irq_remapping.c |   67 ++++++++++++------------------------
 1 file changed, 24 insertions(+), 43 deletions(-)

--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -204,35 +204,40 @@ static int modify_irte(struct irq_2_iomm
 	return rc;
 }
 
-static struct intel_iommu *map_hpet_to_ir(u8 hpet_id)
+static struct irq_domain *map_hpet_to_ir(u8 hpet_id)
 {
 	int i;
 
-	for (i = 0; i < MAX_HPET_TBS; i++)
+	for (i = 0; i < MAX_HPET_TBS; i++) {
 		if (ir_hpet[i].id == hpet_id && ir_hpet[i].iommu)
-			return ir_hpet[i].iommu;
+			return ir_hpet[i].iommu->ir_domain;
+	}
 	return NULL;
 }
 
-static struct intel_iommu *map_ioapic_to_ir(int apic)
+static struct intel_iommu *map_ioapic_to_iommu(int apic)
 {
 	int i;
 
-	for (i = 0; i < MAX_IO_APICS; i++)
+	for (i = 0; i < MAX_IO_APICS; i++) {
 		if (ir_ioapic[i].id == apic && ir_ioapic[i].iommu)
 			return ir_ioapic[i].iommu;
+	}
 	return NULL;
 }
 
-static struct intel_iommu *map_dev_to_ir(struct pci_dev *dev)
+static struct irq_domain *map_ioapic_to_ir(int apic)
 {
-	struct dmar_drhd_unit *drhd;
+	struct intel_iommu *iommu = map_ioapic_to_iommu(apic);
 
-	drhd = dmar_find_matched_drhd_unit(dev);
-	if (!drhd)
-		return NULL;
+	return iommu ? iommu->ir_domain : NULL;
+}
+
+static struct irq_domain *map_dev_to_ir(struct pci_dev *dev)
+{
+	struct dmar_drhd_unit *drhd = dmar_find_matched_drhd_unit(dev);
 
-	return drhd->iommu;
+	return drhd ? drhd->iommu->ir_msi_domain : NULL;
 }
 
 static int clear_entries(struct irq_2_iommu *irq_iommu)
@@ -996,7 +1001,7 @@ static int __init parse_ioapics_under_ir
 
 	for (ioapic_idx = 0; ioapic_idx < nr_ioapics; ioapic_idx++) {
 		int ioapic_id = mpc_ioapic_id(ioapic_idx);
-		if (!map_ioapic_to_ir(ioapic_id)) {
+		if (!map_ioapic_to_iommu(ioapic_id)) {
 			pr_err(FW_BUG "ioapic %d has no mapping iommu, "
 			       "interrupt remapping will be disabled\n",
 			       ioapic_id);
@@ -1101,47 +1106,23 @@ static void prepare_irte(struct irte *ir
 	irte->redir_hint = 1;
 }
 
-static struct irq_domain *intel_get_ir_irq_domain(struct irq_alloc_info *info)
+static struct irq_domain *intel_get_irq_domain(struct irq_alloc_info *info)
 {
-	struct intel_iommu *iommu = NULL;
-
 	if (!info)
 		return NULL;
 
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
-		iommu = map_ioapic_to_ir(info->ioapic_id);
-		break;
+		return map_ioapic_to_ir(info->ioapic_id);
 	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
-		iommu = map_hpet_to_ir(info->hpet_id);
-		break;
-	default:
-		BUG_ON(1);
-		break;
-	}
-
-	return iommu ? iommu->ir_domain : NULL;
-}
-
-static struct irq_domain *intel_get_irq_domain(struct irq_alloc_info *info)
-{
-	struct intel_iommu *iommu;
-
-	if (!info)
-		return NULL;
-
-	switch (info->type) {
+		return map_hpet_to_ir(info->hpet_id);
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-		iommu = map_dev_to_ir(info->msi_dev);
-		if (iommu)
-			return iommu->ir_msi_domain;
-		break;
+		return map_dev_to_ir(info->msi_dev);
 	default:
-		break;
+		WARN_ON_ONCE(1);
+		return NULL;
 	}
-
-	return NULL;
 }
 
 struct irq_remap_ops intel_irq_remap_ops = {
@@ -1150,7 +1131,7 @@ struct irq_remap_ops intel_irq_remap_ops
 	.disable		= disable_irq_remapping,
 	.reenable		= reenable_irq_remapping,
 	.enable_faulting	= enable_drhd_fault_handling,
-	.get_ir_irq_domain	= intel_get_ir_irq_domain,
+	.get_ir_irq_domain	= intel_get_irq_domain,
 	.get_irq_domain		= intel_get_irq_domain,
 };
 



^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 10/46] iommu/amd: Consolidate irq domain getter
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (8 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 09/46] iommu/vt-d: Consolidate irq domain getter Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-26 11:16 ` [patch V2 11/46] iommu/irq_remapping: Consolidate irq domain lookup Thomas Gleixner
                   ` (43 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

The irq domain request mode is now indicated in irq_alloc_info::type.

Consolidate the two getter functions into one.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 drivers/iommu/amd/iommu.c |   65 ++++++++++++++--------------------------------
 1 file changed, 21 insertions(+), 44 deletions(-)

--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3505,77 +3505,54 @@ static void irte_ga_clear_allocated(stru
 
 static int get_devid(struct irq_alloc_info *info)
 {
-	int devid = -1;
-
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC:
-		devid     = get_ioapic_devid(info->ioapic_id);
-		break;
+	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
+		return get_ioapic_devid(info->ioapic_id);
 	case X86_IRQ_ALLOC_TYPE_HPET:
-		devid     = get_hpet_devid(info->hpet_id);
-		break;
+	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
+		return get_hpet_devid(info->hpet_id);
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-		devid = get_device_id(&info->msi_dev->dev);
-		break;
+		return get_device_id(&info->msi_dev->dev);
 	default:
-		BUG_ON(1);
-		break;
+		WARN_ON_ONCE(1);
+		return -1;
 	}
-
-	return devid;
 }
 
-static struct irq_domain *get_ir_irq_domain(struct irq_alloc_info *info)
+static struct irq_domain *get_irq_domain_for_devid(struct irq_alloc_info *info,
+						   int devid)
 {
-	struct amd_iommu *iommu;
-	int devid;
+	struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
 
-	if (!info)
+	if (!iommu)
 		return NULL;
 
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
 	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
-		break;
+		return iommu->ir_domain;
+	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
+	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
+		return iommu->msi_domain;
 	default:
+		WARN_ON_ONCE(1);
 		return NULL;
 	}
-
-	devid = get_devid(info);
-	if (devid >= 0) {
-		iommu = amd_iommu_rlookup_table[devid];
-		if (iommu)
-			return iommu->ir_domain;
-	}
-
-	return NULL;
 }
 
 static struct irq_domain *get_irq_domain(struct irq_alloc_info *info)
 {
-	struct amd_iommu *iommu;
 	int devid;
 
 	if (!info)
 		return NULL;
 
-	switch (info->type) {
-	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
-	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-		devid = get_device_id(&info->msi_dev->dev);
-		if (devid < 0)
-			return NULL;
-
-		iommu = amd_iommu_rlookup_table[devid];
-		if (iommu)
-			return iommu->msi_domain;
-		break;
-	default:
-		break;
-	}
-
-	return NULL;
+	devid = get_devid(info);
+	if (devid < 0)
+		return NULL;
+	return get_irq_domain_for_devid(info, devid);
 }
 
 struct irq_remap_ops amd_iommu_irq_ops = {
@@ -3584,7 +3561,7 @@ struct irq_remap_ops amd_iommu_irq_ops =
 	.disable		= amd_iommu_disable,
 	.reenable		= amd_iommu_reenable,
 	.enable_faulting	= amd_iommu_enable_faulting,
-	.get_ir_irq_domain	= get_ir_irq_domain,
+	.get_ir_irq_domain	= get_irq_domain,
 	.get_irq_domain		= get_irq_domain,
 };
 



^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 11/46] iommu/irq_remapping: Consolidate irq domain lookup
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (9 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 10/46] iommu/amd: " Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-26 11:16 ` [patch V2 12/46] x86/irq: Prepare consolidation of irq_alloc_info Thomas Gleixner
                   ` (42 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

Now that the iommu implementations handle the X86_*_GET_PARENT_DOMAIN
types, consolidate the two getter functions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/irq_remapping.h |    8 --------
 arch/x86/kernel/apic/io_apic.c       |    2 +-
 arch/x86/kernel/apic/msi.c           |    2 +-
 drivers/iommu/amd/iommu.c            |    1 -
 drivers/iommu/hyperv-iommu.c         |    4 ++--
 drivers/iommu/intel/irq_remapping.c  |    1 -
 drivers/iommu/irq_remapping.c        |   23 +----------------------
 drivers/iommu/irq_remapping.h        |    5 +----
 8 files changed, 6 insertions(+), 40 deletions(-)

--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -45,8 +45,6 @@ extern int irq_remap_enable_fault_handli
 extern void panic_if_irq_remap(const char *msg);
 
 extern struct irq_domain *
-irq_remapping_get_ir_irq_domain(struct irq_alloc_info *info);
-extern struct irq_domain *
 irq_remapping_get_irq_domain(struct irq_alloc_info *info);
 
 /* Create PCI MSI/MSIx irqdomain, use @parent as the parent irqdomain. */
@@ -74,12 +72,6 @@ static inline void panic_if_irq_remap(co
 }
 
 static inline struct irq_domain *
-irq_remapping_get_ir_irq_domain(struct irq_alloc_info *info)
-{
-	return NULL;
-}
-
-static inline struct irq_domain *
 irq_remapping_get_irq_domain(struct irq_alloc_info *info)
 {
 	return NULL;
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2298,7 +2298,7 @@ static int mp_irqdomain_create(int ioapi
 	init_irq_alloc_info(&info, NULL);
 	info.type = X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT;
 	info.ioapic_id = mpc_ioapic_id(ioapic);
-	parent = irq_remapping_get_ir_irq_domain(&info);
+	parent = irq_remapping_get_irq_domain(&info);
 	if (!parent)
 		parent = x86_vector_domain;
 	else
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -478,7 +478,7 @@ struct irq_domain *hpet_create_irq_domai
 	init_irq_alloc_info(&info, NULL);
 	info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
 	info.hpet_id = hpet_id;
-	parent = irq_remapping_get_ir_irq_domain(&info);
+	parent = irq_remapping_get_irq_domain(&info);
 	if (parent == NULL)
 		parent = x86_vector_domain;
 	else
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3561,7 +3561,6 @@ struct irq_remap_ops amd_iommu_irq_ops =
 	.disable		= amd_iommu_disable,
 	.reenable		= amd_iommu_reenable,
 	.enable_faulting	= amd_iommu_enable_faulting,
-	.get_ir_irq_domain	= get_irq_domain,
 	.get_irq_domain		= get_irq_domain,
 };
 
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -182,7 +182,7 @@ static int __init hyperv_enable_irq_rema
 	return IRQ_REMAP_X2APIC_MODE;
 }
 
-static struct irq_domain *hyperv_get_ir_irq_domain(struct irq_alloc_info *info)
+static struct irq_domain *hyperv_get_irq_domain(struct irq_alloc_info *info)
 {
 	if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT)
 		return ioapic_ir_domain;
@@ -193,7 +193,7 @@ static struct irq_domain *hyperv_get_ir_
 struct irq_remap_ops hyperv_irq_remap_ops = {
 	.prepare		= hyperv_prepare_irq_remapping,
 	.enable			= hyperv_enable_irq_remapping,
-	.get_ir_irq_domain	= hyperv_get_ir_irq_domain,
+	.get_irq_domain		= hyperv_get_irq_domain,
 };
 
 #endif
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1131,7 +1131,6 @@ struct irq_remap_ops intel_irq_remap_ops
 	.disable		= disable_irq_remapping,
 	.reenable		= reenable_irq_remapping,
 	.enable_faulting	= enable_drhd_fault_handling,
-	.get_ir_irq_domain	= intel_get_irq_domain,
 	.get_irq_domain		= intel_get_irq_domain,
 };
 
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -160,33 +160,12 @@ void panic_if_irq_remap(const char *msg)
 }
 
 /**
- * irq_remapping_get_ir_irq_domain - Get the irqdomain associated with the IOMMU
- *				     device serving request @info
- * @info: interrupt allocation information, used to identify the IOMMU device
- *
- * It's used to get parent irqdomain for HPET and IOAPIC irqdomains.
- * Returns pointer to IRQ domain, or NULL on failure.
- */
-struct irq_domain *
-irq_remapping_get_ir_irq_domain(struct irq_alloc_info *info)
-{
-	if (!remap_ops || !remap_ops->get_ir_irq_domain)
-		return NULL;
-
-	return remap_ops->get_ir_irq_domain(info);
-}
-
-/**
  * irq_remapping_get_irq_domain - Get the irqdomain serving the request @info
  * @info: interrupt allocation information, used to identify the IOMMU device
  *
- * There will be one PCI MSI/MSIX irqdomain associated with each interrupt
- * remapping device, so this interface is used to retrieve the PCI MSI/MSIX
- * irqdomain serving request @info.
  * Returns pointer to IRQ domain, or NULL on failure.
  */
-struct irq_domain *
-irq_remapping_get_irq_domain(struct irq_alloc_info *info)
+struct irq_domain *irq_remapping_get_irq_domain(struct irq_alloc_info *info)
 {
 	if (!remap_ops || !remap_ops->get_irq_domain)
 		return NULL;
--- a/drivers/iommu/irq_remapping.h
+++ b/drivers/iommu/irq_remapping.h
@@ -43,10 +43,7 @@ struct irq_remap_ops {
 	/* Enable fault handling */
 	int  (*enable_faulting)(void);
 
-	/* Get the irqdomain associated the IOMMU device */
-	struct irq_domain *(*get_ir_irq_domain)(struct irq_alloc_info *);
-
-	/* Get the MSI irqdomain associated with the IOMMU device */
+	/* Get the irqdomain associated to IOMMU device */
 	struct irq_domain *(*get_irq_domain)(struct irq_alloc_info *);
 };
 



^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 12/46] x86/irq: Prepare consolidation of irq_alloc_info
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (10 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 11/46] iommu/irq_remapping: Consolidate irq domain lookup Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-26 11:16 ` [patch V2 13/46] x86/msi: Consolidate HPET allocation Thomas Gleixner
                   ` (41 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

struct irq_alloc_info is a horrible zoo of unnamed structs in a union. Many
of the struct fields can be generic and don't have to be type specific like
hpet_id, ioapic_id...

Provide a generic set of members to prepare for the consolidation. The goal
is to make irq_alloc_info have the same basic member as the generic
msi_alloc_info so generic MSI domain ops can be reused and yet more mess
can be avoided when (non-PCI) device MSI support comes along.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/hw_irq.h |   22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -44,10 +44,25 @@ enum irq_alloc_type {
 	X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT,
 };
 
+/**
+ * irq_alloc_info - X86 specific interrupt allocation info
+ * @type:	X86 specific allocation type
+ * @flags:	Flags for allocation tweaks
+ * @devid:	Device ID for allocations
+ * @hwirq:	Associated hw interrupt number in the domain
+ * @mask:	CPU mask for vector allocation
+ * @desc:	Pointer to msi descriptor
+ * @data:	Allocation specific data
+ */
 struct irq_alloc_info {
 	enum irq_alloc_type	type;
 	u32			flags;
-	const struct cpumask	*mask;	/* CPU mask for vector allocation */
+	u32			devid;
+	irq_hw_number_t		hwirq;
+	const struct cpumask	*mask;
+	struct msi_desc		*desc;
+	void			*data;
+
 	union {
 		int		unused;
 #ifdef	CONFIG_HPET_TIMER
@@ -88,11 +103,6 @@ struct irq_alloc_info {
 			char		*uv_name;
 		};
 #endif
-#if IS_ENABLED(CONFIG_VMD)
-		struct {
-			struct msi_desc *desc;
-		};
-#endif
 	};
 };
 



^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 13/46] x86/msi: Consolidate HPET allocation
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (11 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 12/46] x86/irq: Prepare consolidation of irq_alloc_info Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-26 11:16 ` [patch V2 14/46] x86/ioapic: Consolidate IOAPIC allocation Thomas Gleixner
                   ` (40 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

None of the magic HPET fields are required in any way.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/hw_irq.h       |    7 -------
 arch/x86/kernel/apic/msi.c          |   14 +++++++-------
 drivers/iommu/amd/iommu.c           |    2 +-
 drivers/iommu/intel/irq_remapping.c |    4 ++--
 4 files changed, 10 insertions(+), 17 deletions(-)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -65,13 +65,6 @@ struct irq_alloc_info {
 
 	union {
 		int		unused;
-#ifdef	CONFIG_HPET_TIMER
-		struct {
-			int		hpet_id;
-			int		hpet_index;
-			void		*hpet_data;
-		};
-#endif
 #ifdef	CONFIG_PCI_MSI
 		struct {
 			struct pci_dev	*msi_dev;
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -427,7 +427,7 @@ static struct irq_chip hpet_msi_controll
 static irq_hw_number_t hpet_msi_get_hwirq(struct msi_domain_info *info,
 					  msi_alloc_info_t *arg)
 {
-	return arg->hpet_index;
+	return arg->hwirq;
 }
 
 static int hpet_msi_init(struct irq_domain *domain,
@@ -435,8 +435,8 @@ static int hpet_msi_init(struct irq_doma
 			 irq_hw_number_t hwirq, msi_alloc_info_t *arg)
 {
 	irq_set_status_flags(virq, IRQ_MOVE_PCNTXT);
-	irq_domain_set_info(domain, virq, arg->hpet_index, info->chip, NULL,
-			    handle_edge_irq, arg->hpet_data, "edge");
+	irq_domain_set_info(domain, virq, arg->hwirq, info->chip, NULL,
+			    handle_edge_irq, arg->data, "edge");
 
 	return 0;
 }
@@ -477,7 +477,7 @@ struct irq_domain *hpet_create_irq_domai
 
 	init_irq_alloc_info(&info, NULL);
 	info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
-	info.hpet_id = hpet_id;
+	info.devid = hpet_id;
 	parent = irq_remapping_get_irq_domain(&info);
 	if (parent == NULL)
 		parent = x86_vector_domain;
@@ -506,9 +506,9 @@ int hpet_assign_irq(struct irq_domain *d
 
 	init_irq_alloc_info(&info, NULL);
 	info.type = X86_IRQ_ALLOC_TYPE_HPET;
-	info.hpet_data = hc;
-	info.hpet_id = hpet_dev_id(domain);
-	info.hpet_index = dev_num;
+	info.data = hc;
+	info.devid = hpet_dev_id(domain);
+	info.hwirq = dev_num;
 
 	return irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, &info);
 }
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3511,7 +3511,7 @@ static int get_devid(struct irq_alloc_in
 		return get_ioapic_devid(info->ioapic_id);
 	case X86_IRQ_ALLOC_TYPE_HPET:
 	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
-		return get_hpet_devid(info->hpet_id);
+		return get_hpet_devid(info->devid);
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
 		return get_device_id(&info->msi_dev->dev);
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1115,7 +1115,7 @@ static struct irq_domain *intel_get_irq_
 	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
 		return map_ioapic_to_ir(info->ioapic_id);
 	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
-		return map_hpet_to_ir(info->hpet_id);
+		return map_hpet_to_ir(info->devid);
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
 		return map_dev_to_ir(info->msi_dev);
@@ -1285,7 +1285,7 @@ static void intel_irq_remapping_prepare_
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
 		if (info->type == X86_IRQ_ALLOC_TYPE_HPET)
-			set_hpet_sid(irte, info->hpet_id);
+			set_hpet_sid(irte, info->devid);
 		else
 			set_msi_sid(irte, info->msi_dev);
 



^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 14/46] x86/ioapic: Consolidate IOAPIC allocation
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (12 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 13/46] x86/msi: Consolidate HPET allocation Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-09-08 13:35   ` Wei Liu
  2020-08-26 11:16 ` [patch V2 15/46] x86/irq: Consolidate DMAR irq allocation Thomas Gleixner
                   ` (39 subsequent siblings)
  53 siblings, 1 reply; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

Move the IOAPIC specific fields into their own struct and reuse the common
devid. Get rid of the #ifdeffery as it does not matter at all whether the
alloc info is a couple of bytes longer or not.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/hw_irq.h       |   23 ++++++-----
 arch/x86/kernel/apic/io_apic.c      |   70 ++++++++++++++++++------------------
 arch/x86/kernel/devicetree.c        |    4 +-
 drivers/iommu/amd/iommu.c           |   14 +++----
 drivers/iommu/hyperv-iommu.c        |    2 -
 drivers/iommu/intel/irq_remapping.c |   18 ++++-----
 6 files changed, 66 insertions(+), 65 deletions(-)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -44,6 +44,15 @@ enum irq_alloc_type {
 	X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT,
 };
 
+struct ioapic_alloc_info {
+	int				pin;
+	int				node;
+	u32				trigger : 1;
+	u32				polarity : 1;
+	u32				valid : 1;
+	struct IO_APIC_route_entry	*entry;
+};
+
 /**
  * irq_alloc_info - X86 specific interrupt allocation info
  * @type:	X86 specific allocation type
@@ -53,6 +62,8 @@ enum irq_alloc_type {
  * @mask:	CPU mask for vector allocation
  * @desc:	Pointer to msi descriptor
  * @data:	Allocation specific data
+ *
+ * @ioapic:	IOAPIC specific allocation data
  */
 struct irq_alloc_info {
 	enum irq_alloc_type	type;
@@ -64,6 +75,7 @@ struct irq_alloc_info {
 	void			*data;
 
 	union {
+		struct ioapic_alloc_info	ioapic;
 		int		unused;
 #ifdef	CONFIG_PCI_MSI
 		struct {
@@ -71,17 +83,6 @@ struct irq_alloc_info {
 			irq_hw_number_t	msi_hwirq;
 		};
 #endif
-#ifdef	CONFIG_X86_IO_APIC
-		struct {
-			int		ioapic_id;
-			int		ioapic_pin;
-			int		ioapic_node;
-			u32		ioapic_trigger : 1;
-			u32		ioapic_polarity : 1;
-			u32		ioapic_valid : 1;
-			struct IO_APIC_route_entry *ioapic_entry;
-		};
-#endif
 #ifdef	CONFIG_DMAR_TABLE
 		struct {
 			int		dmar_id;
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -860,10 +860,10 @@ void ioapic_set_alloc_attr(struct irq_al
 {
 	init_irq_alloc_info(info, NULL);
 	info->type = X86_IRQ_ALLOC_TYPE_IOAPIC;
-	info->ioapic_node = node;
-	info->ioapic_trigger = trigger;
-	info->ioapic_polarity = polarity;
-	info->ioapic_valid = 1;
+	info->ioapic.node = node;
+	info->ioapic.trigger = trigger;
+	info->ioapic.polarity = polarity;
+	info->ioapic.valid = 1;
 }
 
 #ifndef CONFIG_ACPI
@@ -878,32 +878,32 @@ static void ioapic_copy_alloc_attr(struc
 
 	copy_irq_alloc_info(dst, src);
 	dst->type = X86_IRQ_ALLOC_TYPE_IOAPIC;
-	dst->ioapic_id = mpc_ioapic_id(ioapic_idx);
-	dst->ioapic_pin = pin;
-	dst->ioapic_valid = 1;
-	if (src && src->ioapic_valid) {
-		dst->ioapic_node = src->ioapic_node;
-		dst->ioapic_trigger = src->ioapic_trigger;
-		dst->ioapic_polarity = src->ioapic_polarity;
+	dst->devid = mpc_ioapic_id(ioapic_idx);
+	dst->ioapic.pin = pin;
+	dst->ioapic.valid = 1;
+	if (src && src->ioapic.valid) {
+		dst->ioapic.node = src->ioapic.node;
+		dst->ioapic.trigger = src->ioapic.trigger;
+		dst->ioapic.polarity = src->ioapic.polarity;
 	} else {
-		dst->ioapic_node = NUMA_NO_NODE;
+		dst->ioapic.node = NUMA_NO_NODE;
 		if (acpi_get_override_irq(gsi, &trigger, &polarity) >= 0) {
-			dst->ioapic_trigger = trigger;
-			dst->ioapic_polarity = polarity;
+			dst->ioapic.trigger = trigger;
+			dst->ioapic.polarity = polarity;
 		} else {
 			/*
 			 * PCI interrupts are always active low level
 			 * triggered.
 			 */
-			dst->ioapic_trigger = IOAPIC_LEVEL;
-			dst->ioapic_polarity = IOAPIC_POL_LOW;
+			dst->ioapic.trigger = IOAPIC_LEVEL;
+			dst->ioapic.polarity = IOAPIC_POL_LOW;
 		}
 	}
 }
 
 static int ioapic_alloc_attr_node(struct irq_alloc_info *info)
 {
-	return (info && info->ioapic_valid) ? info->ioapic_node : NUMA_NO_NODE;
+	return (info && info->ioapic.valid) ? info->ioapic.node : NUMA_NO_NODE;
 }
 
 static void mp_register_handler(unsigned int irq, unsigned long trigger)
@@ -933,14 +933,14 @@ static bool mp_check_pin_attr(int irq, s
 	 * pin with real trigger and polarity attributes.
 	 */
 	if (irq < nr_legacy_irqs() && data->count == 1) {
-		if (info->ioapic_trigger != data->trigger)
-			mp_register_handler(irq, info->ioapic_trigger);
-		data->entry.trigger = data->trigger = info->ioapic_trigger;
-		data->entry.polarity = data->polarity = info->ioapic_polarity;
+		if (info->ioapic.trigger != data->trigger)
+			mp_register_handler(irq, info->ioapic.trigger);
+		data->entry.trigger = data->trigger = info->ioapic.trigger;
+		data->entry.polarity = data->polarity = info->ioapic.polarity;
 	}
 
-	return data->trigger == info->ioapic_trigger &&
-	       data->polarity == info->ioapic_polarity;
+	return data->trigger == info->ioapic.trigger &&
+	       data->polarity == info->ioapic.polarity;
 }
 
 static int alloc_irq_from_domain(struct irq_domain *domain, int ioapic, u32 gsi,
@@ -1002,7 +1002,7 @@ static int alloc_isa_irq_from_domain(str
 		if (!mp_check_pin_attr(irq, info))
 			return -EBUSY;
 		if (__add_pin_to_irq_node(irq_data->chip_data, node, ioapic,
-					  info->ioapic_pin))
+					  info->ioapic.pin))
 			return -ENOMEM;
 	} else {
 		info->flags |= X86_IRQ_ALLOC_LEGACY;
@@ -2092,8 +2092,8 @@ static int mp_alloc_timer_irq(int ioapic
 		struct irq_alloc_info info;
 
 		ioapic_set_alloc_attr(&info, NUMA_NO_NODE, 0, 0);
-		info.ioapic_id = mpc_ioapic_id(ioapic);
-		info.ioapic_pin = pin;
+		info.devid = mpc_ioapic_id(ioapic);
+		info.ioapic.pin = pin;
 		mutex_lock(&ioapic_mutex);
 		irq = alloc_isa_irq_from_domain(domain, 0, ioapic, pin, &info);
 		mutex_unlock(&ioapic_mutex);
@@ -2297,7 +2297,7 @@ static int mp_irqdomain_create(int ioapi
 
 	init_irq_alloc_info(&info, NULL);
 	info.type = X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT;
-	info.ioapic_id = mpc_ioapic_id(ioapic);
+	info.devid = mpc_ioapic_id(ioapic);
 	parent = irq_remapping_get_irq_domain(&info);
 	if (!parent)
 		parent = x86_vector_domain;
@@ -2932,9 +2932,9 @@ int mp_ioapic_registered(u32 gsi_base)
 static void mp_irqdomain_get_attr(u32 gsi, struct mp_chip_data *data,
 				  struct irq_alloc_info *info)
 {
-	if (info && info->ioapic_valid) {
-		data->trigger = info->ioapic_trigger;
-		data->polarity = info->ioapic_polarity;
+	if (info && info->ioapic.valid) {
+		data->trigger = info->ioapic.trigger;
+		data->polarity = info->ioapic.polarity;
 	} else if (acpi_get_override_irq(gsi, &data->trigger,
 					 &data->polarity) < 0) {
 		/* PCI interrupts are always active low level triggered. */
@@ -2980,7 +2980,7 @@ int mp_irqdomain_alloc(struct irq_domain
 		return -EINVAL;
 
 	ioapic = mp_irqdomain_ioapic_idx(domain);
-	pin = info->ioapic_pin;
+	pin = info->ioapic.pin;
 	if (irq_find_mapping(domain, (irq_hw_number_t)pin) > 0)
 		return -EEXIST;
 
@@ -2988,7 +2988,7 @@ int mp_irqdomain_alloc(struct irq_domain
 	if (!data)
 		return -ENOMEM;
 
-	info->ioapic_entry = &data->entry;
+	info->ioapic.entry = &data->entry;
 	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, info);
 	if (ret < 0) {
 		kfree(data);
@@ -2996,7 +2996,7 @@ int mp_irqdomain_alloc(struct irq_domain
 	}
 
 	INIT_LIST_HEAD(&data->irq_2_pin);
-	irq_data->hwirq = info->ioapic_pin;
+	irq_data->hwirq = info->ioapic.pin;
 	irq_data->chip = (domain->parent == x86_vector_domain) ?
 			  &ioapic_chip : &ioapic_ir_chip;
 	irq_data->chip_data = data;
@@ -3006,8 +3006,8 @@ int mp_irqdomain_alloc(struct irq_domain
 	add_pin_to_irq_node(data, ioapic_alloc_attr_node(info), ioapic, pin);
 
 	local_irq_save(flags);
-	if (info->ioapic_entry)
-		mp_setup_entry(cfg, data, info->ioapic_entry);
+	if (info->ioapic.entry)
+		mp_setup_entry(cfg, data, info->ioapic.entry);
 	mp_register_handler(virq, data->trigger);
 	if (virq < nr_legacy_irqs())
 		legacy_pic->mask(virq);
--- a/arch/x86/kernel/devicetree.c
+++ b/arch/x86/kernel/devicetree.c
@@ -229,8 +229,8 @@ static int dt_irqdomain_alloc(struct irq
 
 	it = &of_ioapic_type[type_index];
 	ioapic_set_alloc_attr(&tmp, NUMA_NO_NODE, it->trigger, it->polarity);
-	tmp.ioapic_id = mpc_ioapic_id(mp_irqdomain_ioapic_idx(domain));
-	tmp.ioapic_pin = fwspec->param[0];
+	tmp.devid = mpc_ioapic_id(mp_irqdomain_ioapic_idx(domain));
+	tmp.ioapic.pin = fwspec->param[0];
 
 	return mp_irqdomain_alloc(domain, virq, nr_irqs, &tmp);
 }
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3508,7 +3508,7 @@ static int get_devid(struct irq_alloc_in
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC:
 	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
-		return get_ioapic_devid(info->ioapic_id);
+		return get_ioapic_devid(info->devid);
 	case X86_IRQ_ALLOC_TYPE_HPET:
 	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
 		return get_hpet_devid(info->devid);
@@ -3586,15 +3586,15 @@ static void irq_remapping_prepare_irte(s
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC:
 		/* Setup IOAPIC entry */
-		entry = info->ioapic_entry;
-		info->ioapic_entry = NULL;
+		entry = info->ioapic.entry;
+		info->ioapic.entry = NULL;
 		memset(entry, 0, sizeof(*entry));
 		entry->vector        = index;
 		entry->mask          = 0;
-		entry->trigger       = info->ioapic_trigger;
-		entry->polarity      = info->ioapic_polarity;
+		entry->trigger       = info->ioapic.trigger;
+		entry->polarity      = info->ioapic.polarity;
 		/* Mask level triggered irqs. */
-		if (info->ioapic_trigger)
+		if (info->ioapic.trigger)
 			entry->mask = 1;
 		break;
 
@@ -3680,7 +3680,7 @@ static int irq_remapping_alloc(struct ir
 					iommu->irte_ops->set_allocated(table, i);
 			}
 			WARN_ON(table->min_index != 32);
-			index = info->ioapic_pin;
+			index = info->ioapic.pin;
 		} else {
 			index = -ENOMEM;
 		}
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -101,7 +101,7 @@ static int hyperv_irq_remapping_alloc(st
 	 * in the chip_data and hyperv_irq_remapping_activate()/hyperv_ir_set_
 	 * affinity() set vector and dest_apicid directly into IO-APIC entry.
 	 */
-	irq_data->chip_data = info->ioapic_entry;
+	irq_data->chip_data = info->ioapic.entry;
 
 	/*
 	 * Hypver-V IO APIC irq affinity should be in the scope of
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1113,7 +1113,7 @@ static struct irq_domain *intel_get_irq_
 
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
-		return map_ioapic_to_ir(info->ioapic_id);
+		return map_ioapic_to_ir(info->devid);
 	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
 		return map_hpet_to_ir(info->devid);
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
@@ -1254,16 +1254,16 @@ static void intel_irq_remapping_prepare_
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC:
 		/* Set source-id of interrupt request */
-		set_ioapic_sid(irte, info->ioapic_id);
+		set_ioapic_sid(irte, info->devid);
 		apic_printk(APIC_VERBOSE, KERN_DEBUG "IOAPIC[%d]: Set IRTE entry (P:%d FPD:%d Dst_Mode:%d Redir_hint:%d Trig_Mode:%d Dlvry_Mode:%X Avail:%X Vector:%02X Dest:%08X SID:%04X SQ:%X SVT:%X)\n",
-			info->ioapic_id, irte->present, irte->fpd,
+			info->devid, irte->present, irte->fpd,
 			irte->dst_mode, irte->redir_hint,
 			irte->trigger_mode, irte->dlvry_mode,
 			irte->avail, irte->vector, irte->dest_id,
 			irte->sid, irte->sq, irte->svt);
 
-		entry = (struct IR_IO_APIC_route_entry *)info->ioapic_entry;
-		info->ioapic_entry = NULL;
+		entry = (struct IR_IO_APIC_route_entry *)info->ioapic.entry;
+		info->ioapic.entry = NULL;
 		memset(entry, 0, sizeof(*entry));
 		entry->index2	= (index >> 15) & 0x1;
 		entry->zero	= 0;
@@ -1273,11 +1273,11 @@ static void intel_irq_remapping_prepare_
 		 * IO-APIC RTE will be configured with virtual vector.
 		 * irq handler will do the explicit EOI to the io-apic.
 		 */
-		entry->vector	= info->ioapic_pin;
+		entry->vector	= info->ioapic.pin;
 		entry->mask	= 0;			/* enable IRQ */
-		entry->trigger	= info->ioapic_trigger;
-		entry->polarity	= info->ioapic_polarity;
-		if (info->ioapic_trigger)
+		entry->trigger	= info->ioapic.trigger;
+		entry->polarity	= info->ioapic.polarity;
+		if (info->ioapic.trigger)
 			entry->mask = 1; /* Mask level triggered irqs. */
 		break;
 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 15/46] x86/irq: Consolidate DMAR irq allocation
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (13 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 14/46] x86/ioapic: Consolidate IOAPIC allocation Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-26 16:50   ` Dey, Megha
  2020-08-26 11:16 ` [patch V2 16/46] x86/irq: Consolidate UV domain allocation Thomas Gleixner
                   ` (38 subsequent siblings)
  53 siblings, 1 reply; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

None of the DMAR specific fields are required.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/hw_irq.h |    6 ------
 arch/x86/kernel/apic/msi.c    |   10 +++++-----
 2 files changed, 5 insertions(+), 11 deletions(-)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -83,12 +83,6 @@ struct irq_alloc_info {
 			irq_hw_number_t	msi_hwirq;
 		};
 #endif
-#ifdef	CONFIG_DMAR_TABLE
-		struct {
-			int		dmar_id;
-			void		*dmar_data;
-		};
-#endif
 #ifdef	CONFIG_X86_UV
 		struct {
 			int		uv_limit;
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -329,15 +329,15 @@ static struct irq_chip dmar_msi_controll
 static irq_hw_number_t dmar_msi_get_hwirq(struct msi_domain_info *info,
 					  msi_alloc_info_t *arg)
 {
-	return arg->dmar_id;
+	return arg->hwirq;
 }
 
 static int dmar_msi_init(struct irq_domain *domain,
 			 struct msi_domain_info *info, unsigned int virq,
 			 irq_hw_number_t hwirq, msi_alloc_info_t *arg)
 {
-	irq_domain_set_info(domain, virq, arg->dmar_id, info->chip, NULL,
-			    handle_edge_irq, arg->dmar_data, "edge");
+	irq_domain_set_info(domain, virq, arg->devid, info->chip, NULL,
+			    handle_edge_irq, arg->data, "edge");
 
 	return 0;
 }
@@ -384,8 +384,8 @@ int dmar_alloc_hwirq(int id, int node, v
 
 	init_irq_alloc_info(&info, NULL);
 	info.type = X86_IRQ_ALLOC_TYPE_DMAR;
-	info.dmar_id = id;
-	info.dmar_data = arg;
+	info.devid = id;
+	info.data = arg;
 
 	return irq_domain_alloc_irqs(domain, 1, node, &info);
 }



^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 16/46] x86/irq: Consolidate UV domain allocation
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (14 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 15/46] x86/irq: Consolidate DMAR irq allocation Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-26 11:16 ` [patch V2 17/46] PCI/MSI: Rework pci_msi_domain_calc_hwirq() Thomas Gleixner
                   ` (37 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

Move the UV specific fields into their own struct for readability sake. Get
rid of the #ifdeffery as it does not matter at all whether the alloc info
is a couple of bytes longer or not.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/hw_irq.h |   21 ++++++++++++---------
 arch/x86/platform/uv/uv_irq.c |   16 ++++++++--------
 2 files changed, 20 insertions(+), 17 deletions(-)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -53,6 +53,14 @@ struct ioapic_alloc_info {
 	struct IO_APIC_route_entry	*entry;
 };
 
+struct uv_alloc_info {
+	int		limit;
+	int		blade;
+	unsigned long	offset;
+	char		*name;
+
+};
+
 /**
  * irq_alloc_info - X86 specific interrupt allocation info
  * @type:	X86 specific allocation type
@@ -64,7 +72,8 @@ struct ioapic_alloc_info {
  * @data:	Allocation specific data
  *
  * @ioapic:	IOAPIC specific allocation data
- */
+ * @uv:		UV specific allocation data
+*/
 struct irq_alloc_info {
 	enum irq_alloc_type	type;
 	u32			flags;
@@ -76,6 +85,8 @@ struct irq_alloc_info {
 
 	union {
 		struct ioapic_alloc_info	ioapic;
+		struct uv_alloc_info		uv;
+
 		int		unused;
 #ifdef	CONFIG_PCI_MSI
 		struct {
@@ -83,14 +94,6 @@ struct irq_alloc_info {
 			irq_hw_number_t	msi_hwirq;
 		};
 #endif
-#ifdef	CONFIG_X86_UV
-		struct {
-			int		uv_limit;
-			int		uv_blade;
-			unsigned long	uv_offset;
-			char		*uv_name;
-		};
-#endif
 	};
 };
 
--- a/arch/x86/platform/uv/uv_irq.c
+++ b/arch/x86/platform/uv/uv_irq.c
@@ -90,15 +90,15 @@ static int uv_domain_alloc(struct irq_do
 
 	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
 	if (ret >= 0) {
-		if (info->uv_limit == UV_AFFINITY_CPU)
+		if (info->uv.limit == UV_AFFINITY_CPU)
 			irq_set_status_flags(virq, IRQ_NO_BALANCING);
 		else
 			irq_set_status_flags(virq, IRQ_MOVE_PCNTXT);
 
-		chip_data->pnode = uv_blade_to_pnode(info->uv_blade);
-		chip_data->offset = info->uv_offset;
+		chip_data->pnode = uv_blade_to_pnode(info->uv.blade);
+		chip_data->offset = info->uv.offset;
 		irq_domain_set_info(domain, virq, virq, &uv_irq_chip, chip_data,
-				    handle_percpu_irq, NULL, info->uv_name);
+				    handle_percpu_irq, NULL, info->uv.name);
 	} else {
 		kfree(chip_data);
 	}
@@ -193,10 +193,10 @@ int uv_setup_irq(char *irq_name, int cpu
 
 	init_irq_alloc_info(&info, cpumask_of(cpu));
 	info.type = X86_IRQ_ALLOC_TYPE_UV;
-	info.uv_limit = limit;
-	info.uv_blade = mmr_blade;
-	info.uv_offset = mmr_offset;
-	info.uv_name = irq_name;
+	info.uv.limit = limit;
+	info.uv.blade = mmr_blade;
+	info.uv.offset = mmr_offset;
+	info.uv.name = irq_name;
 
 	return irq_domain_alloc_irqs(domain, 1,
 				     uv_blade_to_memory_nid(mmr_blade), &info);



^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 17/46] PCI/MSI: Rework pci_msi_domain_calc_hwirq()
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (15 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 16/46] x86/irq: Consolidate UV domain allocation Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-26 20:24   ` Marc Zyngier
  2020-08-26 11:16 ` [patch V2 18/46] x86/msi: Consolidate MSI allocation Thomas Gleixner
                   ` (36 subsequent siblings)
  53 siblings, 1 reply; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

Retrieve the PCI device from the msi descriptor instead of doing so at the
call sites.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
V2: Address Bjorns comments (subject prefix, pdev/dev)
---
 arch/x86/kernel/apic/msi.c |    2 +-
 drivers/pci/msi.c          |    9 ++++-----
 include/linux/msi.h        |    3 +--
 3 files changed, 6 insertions(+), 8 deletions(-)

--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -232,7 +232,7 @@ EXPORT_SYMBOL_GPL(pci_msi_prepare);
 
 void pci_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
 {
-	arg->msi_hwirq = pci_msi_domain_calc_hwirq(arg->msi_dev, desc);
+	arg->msi_hwirq = pci_msi_domain_calc_hwirq(desc);
 }
 EXPORT_SYMBOL_GPL(pci_msi_set_desc);
 
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1346,14 +1346,14 @@ void pci_msi_domain_write_msg(struct irq
 
 /**
  * pci_msi_domain_calc_hwirq - Generate a unique ID for an MSI source
- * @dev:	Pointer to the PCI device
  * @desc:	Pointer to the MSI descriptor
  *
  * The ID number is only used within the irqdomain.
  */
-irq_hw_number_t pci_msi_domain_calc_hwirq(struct pci_dev *dev,
-					  struct msi_desc *desc)
+irq_hw_number_t pci_msi_domain_calc_hwirq(struct msi_desc *desc)
 {
+	struct pci_dev *dev = msi_desc_to_pci_dev(desc);
+
 	return (irq_hw_number_t)desc->msi_attrib.entry_nr |
 		pci_dev_id(dev) << 11 |
 		(pci_domain_nr(dev->bus) & 0xFFFFFFFF) << 27;
@@ -1406,8 +1406,7 @@ static void pci_msi_domain_set_desc(msi_
 				    struct msi_desc *desc)
 {
 	arg->desc = desc;
-	arg->hwirq = pci_msi_domain_calc_hwirq(msi_desc_to_pci_dev(desc),
-					       desc);
+	arg->hwirq = pci_msi_domain_calc_hwirq(desc);
 }
 #else
 #define pci_msi_domain_set_desc		NULL
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -369,8 +369,7 @@ void pci_msi_domain_write_msg(struct irq
 struct irq_domain *pci_msi_create_irq_domain(struct fwnode_handle *fwnode,
 					     struct msi_domain_info *info,
 					     struct irq_domain *parent);
-irq_hw_number_t pci_msi_domain_calc_hwirq(struct pci_dev *dev,
-					  struct msi_desc *desc);
+irq_hw_number_t pci_msi_domain_calc_hwirq(struct msi_desc *desc);
 int pci_msi_domain_check_cap(struct irq_domain *domain,
 			     struct msi_domain_info *info, struct device *dev);
 u32 pci_msi_domain_get_msi_rid(struct irq_domain *domain, struct pci_dev *pdev);


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 18/46] x86/msi: Consolidate MSI allocation
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (16 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 17/46] PCI/MSI: Rework pci_msi_domain_calc_hwirq() Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-09-08 13:36   ` Wei Liu
  2020-08-26 11:16 ` [patch V2 19/46] x86/msi: Use generic MSI domain ops Thomas Gleixner
                   ` (35 subsequent siblings)
  53 siblings, 1 reply; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

Convert the interrupt remap drivers to retrieve the pci device from the msi
descriptor and use info::hwirq.

This is the first step to prepare x86 for using the generic MSI domain ops.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/hw_irq.h       |    8 --------
 arch/x86/kernel/apic/msi.c          |    7 +++----
 drivers/iommu/amd/iommu.c           |    5 +++--
 drivers/iommu/intel/irq_remapping.c |    4 ++--
 drivers/pci/controller/pci-hyperv.c |    2 +-
 5 files changed, 9 insertions(+), 17 deletions(-)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -85,14 +85,6 @@ struct irq_alloc_info {
 	union {
 		struct ioapic_alloc_info	ioapic;
 		struct uv_alloc_info		uv;
-
-		int		unused;
-#ifdef	CONFIG_PCI_MSI
-		struct {
-			struct pci_dev	*msi_dev;
-			irq_hw_number_t	msi_hwirq;
-		};
-#endif
 	};
 };
 
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -189,7 +189,6 @@ int native_setup_msi_irqs(struct pci_dev
 
 	init_irq_alloc_info(&info, NULL);
 	info.type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
-	info.msi_dev = dev;
 
 	domain = irq_remapping_get_irq_domain(&info);
 	if (domain == NULL)
@@ -208,7 +207,7 @@ void native_teardown_msi_irq(unsigned in
 static irq_hw_number_t pci_msi_get_hwirq(struct msi_domain_info *info,
 					 msi_alloc_info_t *arg)
 {
-	return arg->msi_hwirq;
+	return arg->hwirq;
 }
 
 int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
@@ -218,7 +217,6 @@ int pci_msi_prepare(struct irq_domain *d
 	struct msi_desc *desc = first_pci_msi_entry(pdev);
 
 	init_irq_alloc_info(arg, NULL);
-	arg->msi_dev = pdev;
 	if (desc->msi_attrib.is_msix) {
 		arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX;
 	} else {
@@ -232,7 +230,8 @@ EXPORT_SYMBOL_GPL(pci_msi_prepare);
 
 void pci_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
 {
-	arg->msi_hwirq = pci_msi_domain_calc_hwirq(desc);
+	arg->desc = desc;
+	arg->hwirq = pci_msi_domain_calc_hwirq(desc);
 }
 EXPORT_SYMBOL_GPL(pci_msi_set_desc);
 
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3514,7 +3514,7 @@ static int get_devid(struct irq_alloc_in
 		return get_hpet_devid(info->devid);
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-		return get_device_id(&info->msi_dev->dev);
+		return get_device_id(msi_desc_to_dev(info->desc));
 	default:
 		WARN_ON_ONCE(1);
 		return -1;
@@ -3688,7 +3688,8 @@ static int irq_remapping_alloc(struct ir
 		   info->type == X86_IRQ_ALLOC_TYPE_PCI_MSIX) {
 		bool align = (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI);
 
-		index = alloc_irq_index(devid, nr_irqs, align, info->msi_dev);
+		index = alloc_irq_index(devid, nr_irqs, align,
+					msi_desc_to_pci_dev(info->desc));
 	} else {
 		index = alloc_irq_index(devid, nr_irqs, false, NULL);
 	}
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1118,7 +1118,7 @@ static struct irq_domain *intel_get_irq_
 		return map_hpet_to_ir(info->devid);
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-		return map_dev_to_ir(info->msi_dev);
+		return map_dev_to_ir(msi_desc_to_pci_dev(info->desc));
 	default:
 		WARN_ON_ONCE(1);
 		return NULL;
@@ -1287,7 +1287,7 @@ static void intel_irq_remapping_prepare_
 		if (info->type == X86_IRQ_ALLOC_TYPE_HPET)
 			set_hpet_sid(irte, info->devid);
 		else
-			set_msi_sid(irte, info->msi_dev);
+			set_msi_sid(irte, msi_desc_to_pci_dev(info->desc));
 
 		msg->address_hi = MSI_ADDR_BASE_HI;
 		msg->data = sub_handle;
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -1534,7 +1534,7 @@ static struct irq_chip hv_msi_irq_chip =
 static irq_hw_number_t hv_msi_domain_ops_get_hwirq(struct msi_domain_info *info,
 						   msi_alloc_info_t *arg)
 {
-	return arg->msi_hwirq;
+	return arg->hwirq;
 }
 
 static struct msi_domain_ops hv_msi_ops = {



^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 19/46] x86/msi: Use generic MSI domain ops
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (17 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 18/46] x86/msi: Consolidate MSI allocation Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-26 20:21   ` Marc Zyngier
  2020-08-26 11:16 ` [patch V2 20/46] x86/irq: Move apic_post_init() invocation to one place Thomas Gleixner
                   ` (34 subsequent siblings)
  53 siblings, 1 reply; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

pci_msi_get_hwirq() and pci_msi_set_desc are not longer special. Enable the
generic MSI domain ops in the core and PCI MSI code unconditionally and get
rid of the x86 specific implementations in the X86 MSI code and in the
hyperv PCI driver.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/msi.h          |    2 --
 arch/x86/kernel/apic/msi.c          |   15 ---------------
 drivers/pci/controller/pci-hyperv.c |    8 --------
 drivers/pci/msi.c                   |    4 ----
 kernel/irq/msi.c                    |    6 ------
 5 files changed, 35 deletions(-)

--- a/arch/x86/include/asm/msi.h
+++ b/arch/x86/include/asm/msi.h
@@ -9,6 +9,4 @@ typedef struct irq_alloc_info msi_alloc_
 int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
 		    msi_alloc_info_t *arg);
 
-void pci_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc);
-
 #endif /* _ASM_X86_MSI_H */
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -204,12 +204,6 @@ void native_teardown_msi_irq(unsigned in
 	irq_domain_free_irqs(irq, 1);
 }
 
-static irq_hw_number_t pci_msi_get_hwirq(struct msi_domain_info *info,
-					 msi_alloc_info_t *arg)
-{
-	return arg->hwirq;
-}
-
 int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
 		    msi_alloc_info_t *arg)
 {
@@ -228,17 +222,8 @@ int pci_msi_prepare(struct irq_domain *d
 }
 EXPORT_SYMBOL_GPL(pci_msi_prepare);
 
-void pci_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
-{
-	arg->desc = desc;
-	arg->hwirq = pci_msi_domain_calc_hwirq(desc);
-}
-EXPORT_SYMBOL_GPL(pci_msi_set_desc);
-
 static struct msi_domain_ops pci_msi_domain_ops = {
-	.get_hwirq	= pci_msi_get_hwirq,
 	.msi_prepare	= pci_msi_prepare,
-	.set_desc	= pci_msi_set_desc,
 };
 
 static struct msi_domain_info pci_msi_domain_info = {
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -1531,16 +1531,8 @@ static struct irq_chip hv_msi_irq_chip =
 	.irq_unmask		= hv_irq_unmask,
 };
 
-static irq_hw_number_t hv_msi_domain_ops_get_hwirq(struct msi_domain_info *info,
-						   msi_alloc_info_t *arg)
-{
-	return arg->hwirq;
-}
-
 static struct msi_domain_ops hv_msi_ops = {
-	.get_hwirq	= hv_msi_domain_ops_get_hwirq,
 	.msi_prepare	= pci_msi_prepare,
-	.set_desc	= pci_msi_set_desc,
 	.msi_free	= hv_msi_free,
 };
 
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1401,16 +1401,12 @@ static int pci_msi_domain_handle_error(s
 	return error;
 }
 
-#ifdef GENERIC_MSI_DOMAIN_OPS
 static void pci_msi_domain_set_desc(msi_alloc_info_t *arg,
 				    struct msi_desc *desc)
 {
 	arg->desc = desc;
 	arg->hwirq = pci_msi_domain_calc_hwirq(desc);
 }
-#else
-#define pci_msi_domain_set_desc		NULL
-#endif
 
 static struct msi_domain_ops pci_msi_domain_ops_default = {
 	.set_desc	= pci_msi_domain_set_desc,
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -187,7 +187,6 @@ static const struct irq_domain_ops msi_d
 	.deactivate	= msi_domain_deactivate,
 };
 
-#ifdef GENERIC_MSI_DOMAIN_OPS
 static irq_hw_number_t msi_domain_ops_get_hwirq(struct msi_domain_info *info,
 						msi_alloc_info_t *arg)
 {
@@ -206,11 +205,6 @@ static void msi_domain_ops_set_desc(msi_
 {
 	arg->desc = desc;
 }
-#else
-#define msi_domain_ops_get_hwirq	NULL
-#define msi_domain_ops_prepare		NULL
-#define msi_domain_ops_set_desc		NULL
-#endif /* !GENERIC_MSI_DOMAIN_OPS */
 
 static int msi_domain_ops_init(struct irq_domain *domain,
 			       struct msi_domain_info *info,



^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 20/46] x86/irq: Move apic_post_init() invocation to one place
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (18 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 19/46] x86/msi: Use generic MSI domain ops Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-26 11:16 ` [patch V2 21/46] x86/pci: Reducde #ifdeffery in PCI init code Thomas Gleixner
                   ` (33 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

No point to call it from both 32bit and 64bit implementations of
default_setup_apic_routing(). Move it to the caller.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/kernel/apic/apic.c     |    3 +++
 arch/x86/kernel/apic/probe_32.c |    3 ---
 arch/x86/kernel/apic/probe_64.c |    3 ---
 3 files changed, 3 insertions(+), 6 deletions(-)

--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1429,6 +1429,9 @@ void __init apic_intr_mode_init(void)
 		break;
 	}
 
+	if (x86_platform.apic_post_init)
+		x86_platform.apic_post_init();
+
 	apic_bsp_setup(upmode);
 }
 
--- a/arch/x86/kernel/apic/probe_32.c
+++ b/arch/x86/kernel/apic/probe_32.c
@@ -170,9 +170,6 @@ void __init default_setup_apic_routing(v
 
 	if (apic->setup_apic_routing)
 		apic->setup_apic_routing();
-
-	if (x86_platform.apic_post_init)
-		x86_platform.apic_post_init();
 }
 
 void __init generic_apic_probe(void)
--- a/arch/x86/kernel/apic/probe_64.c
+++ b/arch/x86/kernel/apic/probe_64.c
@@ -32,9 +32,6 @@ void __init default_setup_apic_routing(v
 			break;
 		}
 	}
-
-	if (x86_platform.apic_post_init)
-		x86_platform.apic_post_init();
 }
 
 int __init default_acpi_madt_oem_check(char *oem_id, char *oem_table_id)



^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 21/46] x86/pci: Reducde #ifdeffery in PCI init code
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (19 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 20/46] x86/irq: Move apic_post_init() invocation to one place Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-26 11:16 ` [patch V2 22/46] x86/irq: Initialize PCI/MSI domain at PCI init time Thomas Gleixner
                   ` (32 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

Adding a function call before the first #ifdef in arch_pci_init() triggers
a 'mixed declarations and code' warning if PCI_DIRECT is enabled.

Use stub functions and move the #ifdeffery to the header file where it is
not in the way.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/pci_x86.h |   11 +++++++++++
 arch/x86/pci/init.c            |   10 +++-------
 2 files changed, 14 insertions(+), 7 deletions(-)

--- a/arch/x86/include/asm/pci_x86.h
+++ b/arch/x86/include/asm/pci_x86.h
@@ -114,9 +114,20 @@ extern const struct pci_raw_ops pci_dire
 extern bool port_cf9_safe;
 
 /* arch_initcall level */
+#ifdef CONFIG_PCI_DIRECT
 extern int pci_direct_probe(void);
 extern void pci_direct_init(int type);
+#else
+static inline int pci_direct_probe(void) { return -1; }
+static inline  void pci_direct_init(int type) { }
+#endif
+
+#ifdef CONFIG_PCI_BIOS
 extern void pci_pcbios_init(void);
+#else
+static inline void pci_pcbios_init(void) { }
+#endif
+
 extern void __init dmi_check_pciprobe(void);
 extern void __init dmi_check_skip_isa_align(void);
 
--- a/arch/x86/pci/init.c
+++ b/arch/x86/pci/init.c
@@ -8,11 +8,9 @@
    in the right sequence from here. */
 static __init int pci_arch_init(void)
 {
-#ifdef CONFIG_PCI_DIRECT
-	int type = 0;
+	int type;
 
 	type = pci_direct_probe();
-#endif
 
 	if (!(pci_probe & PCI_PROBE_NOEARLY))
 		pci_mmcfg_early_init();
@@ -20,18 +18,16 @@ static __init int pci_arch_init(void)
 	if (x86_init.pci.arch_init && !x86_init.pci.arch_init())
 		return 0;
 
-#ifdef CONFIG_PCI_BIOS
 	pci_pcbios_init();
-#endif
+
 	/*
 	 * don't check for raw_pci_ops here because we want pcbios as last
 	 * fallback, yet it's needed to run first to set pcibios_last_bus
 	 * in case legacy PCI probing is used. otherwise detecting peer busses
 	 * fails.
 	 */
-#ifdef CONFIG_PCI_DIRECT
 	pci_direct_init(type);
-#endif
+
 	if (!raw_pci_ops && !raw_pci_ext_ops)
 		printk(KERN_ERR
 		"PCI: Fatal: No config space access function found\n");



^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 22/46] x86/irq: Initialize PCI/MSI domain at PCI init time
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (20 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 21/46] x86/pci: Reducde #ifdeffery in PCI init code Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-26 11:16 ` [patch V2 23/46] irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI Thomas Gleixner
                   ` (31 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

No point in initializing the default PCI/MSI interrupt domain early and no
point to create it when XEN PV/HVM/DOM0 are active.

Move the initialization to pci_arch_init() and convert it to init ops so
that XEN can override it as XEN has it's own PCI/MSI management. The XEN
override comes in a later step.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Add missing include
---
 arch/x86/include/asm/irqdomain.h |    6 ++++--
 arch/x86/include/asm/x86_init.h  |    3 +++
 arch/x86/kernel/apic/msi.c       |   31 +++++++++++++++++++------------
 arch/x86/kernel/apic/vector.c    |    2 --
 arch/x86/kernel/x86_init.c       |    4 +++-
 arch/x86/pci/init.c              |    3 +++
 6 files changed, 32 insertions(+), 17 deletions(-)

--- a/arch/x86/include/asm/irqdomain.h
+++ b/arch/x86/include/asm/irqdomain.h
@@ -51,9 +51,11 @@ extern int mp_irqdomain_ioapic_idx(struc
 #endif /* CONFIG_X86_IO_APIC */
 
 #ifdef CONFIG_PCI_MSI
-extern void arch_init_msi_domain(struct irq_domain *domain);
+void x86_create_pci_msi_domain(void);
+struct irq_domain *native_create_pci_msi_domain(void);
 #else
-static inline void arch_init_msi_domain(struct irq_domain *domain) { }
+static inline void x86_create_pci_msi_domain(void) { }
+#define native_create_pci_msi_domain	NULL
 #endif
 
 #endif
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -8,6 +8,7 @@ struct mpc_bus;
 struct mpc_cpu;
 struct mpc_table;
 struct cpuinfo_x86;
+struct irq_domain;
 
 /**
  * struct x86_init_mpparse - platform specific mpparse ops
@@ -42,12 +43,14 @@ struct x86_init_resources {
  * @intr_init:			interrupt init code
  * @intr_mode_select:		interrupt delivery mode selection
  * @intr_mode_init:		interrupt delivery mode setup
+ * @create_pci_msi_domain:	Create the PCI/MSI interrupt domain
  */
 struct x86_init_irqs {
 	void (*pre_vector_init)(void);
 	void (*intr_init)(void);
 	void (*intr_mode_select)(void);
 	void (*intr_mode_init)(void);
+	struct irq_domain *(*create_pci_msi_domain)(void);
 };
 
 /**
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -21,7 +21,7 @@
 #include <asm/apic.h>
 #include <asm/irq_remapping.h>
 
-static struct irq_domain *msi_default_domain;
+static struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
 
 static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg)
 {
@@ -192,7 +192,7 @@ int native_setup_msi_irqs(struct pci_dev
 
 	domain = irq_remapping_get_irq_domain(&info);
 	if (domain == NULL)
-		domain = msi_default_domain;
+		domain = x86_pci_msi_default_domain;
 	if (domain == NULL)
 		return -ENOSYS;
 
@@ -235,25 +235,32 @@ static struct msi_domain_info pci_msi_do
 	.handler_name	= "edge",
 };
 
-void __init arch_init_msi_domain(struct irq_domain *parent)
+struct irq_domain * __init native_create_pci_msi_domain(void)
 {
 	struct fwnode_handle *fn;
+	struct irq_domain *d;
 
 	if (disable_apic)
-		return;
+		return NULL;
 
 	fn = irq_domain_alloc_named_fwnode("PCI-MSI");
-	if (fn) {
-		msi_default_domain =
-			pci_msi_create_irq_domain(fn, &pci_msi_domain_info,
-						  parent);
-	}
-	if (!msi_default_domain) {
+	if (!fn)
+		return NULL;
+
+	d = pci_msi_create_irq_domain(fn, &pci_msi_domain_info,
+				      x86_vector_domain);
+	if (!d) {
 		irq_domain_free_fwnode(fn);
-		pr_warn("failed to initialize irqdomain for MSI/MSI-x.\n");
+		pr_warn("Failed to initialize PCI-MSI irqdomain.\n");
 	} else {
-		msi_default_domain->flags |= IRQ_DOMAIN_MSI_NOMASK_QUIRK;
+		d->flags |= IRQ_DOMAIN_MSI_NOMASK_QUIRK;
 	}
+	return d;
+}
+
+void __init x86_create_pci_msi_domain(void)
+{
+	x86_pci_msi_default_domain = x86_init.irqs.create_pci_msi_domain();
 }
 
 #ifdef CONFIG_IRQ_REMAP
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -713,8 +713,6 @@ int __init arch_early_irq_init(void)
 	BUG_ON(x86_vector_domain == NULL);
 	irq_set_default_host(x86_vector_domain);
 
-	arch_init_msi_domain(x86_vector_domain);
-
 	BUG_ON(!alloc_cpumask_var(&vector_searchmask, GFP_KERNEL));
 
 	/*
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -24,6 +24,7 @@
 #include <asm/tsc.h>
 #include <asm/iommu.h>
 #include <asm/mach_traps.h>
+#include <asm/irqdomain.h>
 
 void x86_init_noop(void) { }
 void __init x86_init_uint_noop(unsigned int unused) { }
@@ -76,7 +77,8 @@ struct x86_init_ops x86_init __initdata
 		.pre_vector_init	= init_ISA_irqs,
 		.intr_init		= native_init_IRQ,
 		.intr_mode_select	= apic_intr_mode_select,
-		.intr_mode_init		= apic_intr_mode_init
+		.intr_mode_init		= apic_intr_mode_init,
+		.create_pci_msi_domain	= native_create_pci_msi_domain,
 	},
 
 	.oem = {
--- a/arch/x86/pci/init.c
+++ b/arch/x86/pci/init.c
@@ -3,6 +3,7 @@
 #include <linux/init.h>
 #include <asm/pci_x86.h>
 #include <asm/x86_init.h>
+#include <asm/irqdomain.h>
 
 /* arch_initcall has too random ordering, so call the initializers
    in the right sequence from here. */
@@ -10,6 +11,8 @@ static __init int pci_arch_init(void)
 {
 	int type;
 
+	x86_create_pci_msi_domain();
+
 	type = pci_direct_probe();
 
 	if (!(pci_probe & PCI_PROBE_NOEARLY))


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 23/46] irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (21 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 22/46] x86/irq: Initialize PCI/MSI domain at PCI init time Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-26 20:42   ` Marc Zyngier
  2020-08-26 11:16 ` [patch V2 24/46] PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI Thomas Gleixner
                   ` (30 subsequent siblings)
  53 siblings, 1 reply; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

PCI devices behind a VMD bus are not subject to interrupt remapping, but
the irq domain for VMD MSI cannot be distinguished from a regular PCI/MSI
irq domain.

Add a new domain bus token and allow it in the bus token check in
msi_check_reservation_mode() to keep the functionality the same once VMD
uses this token.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 include/linux/irqdomain.h |    1 +
 kernel/irq/msi.c          |    7 ++++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -84,6 +84,7 @@ enum irq_domain_bus_token {
 	DOMAIN_BUS_FSL_MC_MSI,
 	DOMAIN_BUS_TI_SCI_INTA_MSI,
 	DOMAIN_BUS_WAKEUP,
+	DOMAIN_BUS_VMD_MSI,
 };
 
 /**
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -370,8 +370,13 @@ static bool msi_check_reservation_mode(s
 {
 	struct msi_desc *desc;
 
-	if (domain->bus_token != DOMAIN_BUS_PCI_MSI)
+	switch(domain->bus_token) {
+	case DOMAIN_BUS_PCI_MSI:
+	case DOMAIN_BUS_VMD_MSI:
+		break;
+	default:
 		return false;
+	}
 
 	if (!(info->flags & MSI_FLAG_MUST_REACTIVATE))
 		return false;



^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 24/46] PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (22 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 23/46] irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-26 20:47   ` Marc Zyngier
  2020-08-31 14:39   ` Jason Gunthorpe
  2020-08-26 11:16 ` [patch V2 25/46] PCI/MSI: Provide pci_dev_has_special_msi_domain() helper Thomas Gleixner
                   ` (29 subsequent siblings)
  53 siblings, 2 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

Devices on the VMD bus use their own MSI irq domain, but it is not
distinguishable from regular PCI/MSI irq domains. This is required
to exclude VMD devices from getting the irq domain pointer set by
interrupt remapping.

Override the default bus token.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/controller/vmd.c |    6 ++++++
 1 file changed, 6 insertions(+)

--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -579,6 +579,12 @@ static int vmd_enable_domain(struct vmd_
 		return -ENODEV;
 	}
 
+	/*
+	 * Override the irq domain bus token so the domain can be distinguished
+	 * from a regular PCI/MSI domain.
+	 */
+	irq_domain_update_bus_token(vmd->irq_domain, DOMAIN_BUS_VMD_MSI);
+
 	pci_add_resource(&resources, &vmd->resources[0]);
 	pci_add_resource_offset(&resources, &vmd->resources[1], offset[0]);
 	pci_add_resource_offset(&resources, &vmd->resources[2], offset[1]);



^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 25/46] PCI/MSI: Provide pci_dev_has_special_msi_domain() helper
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (23 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 24/46] PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-26 11:16 ` [patch V2 26/46] x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init() Thomas Gleixner
                   ` (28 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

Provide a helper function to check whether a PCI device is handled by a
non-standard PCI/MSI domain. This will be used to exclude such devices
which hang of a special bus, e.g. VMD, to be excluded from the irq domain
override in irq remapping.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/msi.c   |   22 ++++++++++++++++++++++
 include/linux/msi.h |    1 +
 2 files changed, 23 insertions(+)

--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1553,4 +1553,26 @@ struct irq_domain *pci_msi_get_device_do
 					     DOMAIN_BUS_PCI_MSI);
 	return dom;
 }
+
+/**
+ * pci_dev_has_special_msi_domain - Check whether the device is handled by
+ *				    a non-standard PCI-MSI domain
+ * @pdev:	The PCI device to check.
+ *
+ * Returns: True if the device irqdomain or the bus irqdomain is
+ * non-standard PCI/MSI.
+ */
+bool pci_dev_has_special_msi_domain(struct pci_dev *pdev)
+{
+	struct irq_domain *dom = dev_get_msi_domain(&pdev->dev);
+
+	if (!dom)
+		dom = dev_get_msi_domain(&pdev->bus->dev);
+
+	if (!dom)
+		return true;
+
+	return dom->bus_token != DOMAIN_BUS_PCI_MSI;
+}
+
 #endif /* CONFIG_PCI_MSI_IRQ_DOMAIN */
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -374,6 +374,7 @@ int pci_msi_domain_check_cap(struct irq_
 			     struct msi_domain_info *info, struct device *dev);
 u32 pci_msi_domain_get_msi_rid(struct irq_domain *domain, struct pci_dev *pdev);
 struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev);
+bool pci_dev_has_special_msi_domain(struct pci_dev *pdev);
 #else
 static inline struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev)
 {


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 26/46] x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init()
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (24 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 25/46] PCI/MSI: Provide pci_dev_has_special_msi_domain() helper Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-26 11:16 ` [patch V2 27/46] x86/xen: Rework MSI teardown Thomas Gleixner
                   ` (27 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

The only user is in the same file and the name is too generic because this
function is only ever used for HVM domains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross<jgross@suse.com>


---
 arch/x86/pci/xen.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -420,7 +420,7 @@ int __init pci_xen_init(void)
 }
 
 #ifdef CONFIG_PCI_MSI
-void __init xen_msi_init(void)
+static void __init xen_hvm_msi_init(void)
 {
 	if (!disable_apic) {
 		/*
@@ -460,7 +460,7 @@ int __init pci_xen_hvm_init(void)
 	 * We need to wait until after x2apic is initialized
 	 * before we can set MSI IRQ ops.
 	 */
-	x86_platform.apic_post_init = xen_msi_init;
+	x86_platform.apic_post_init = xen_hvm_msi_init;
 #endif
 	return 0;
 }


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 27/46] x86/xen: Rework MSI teardown
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (25 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 26/46] x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init() Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-27  7:46   ` Jürgen Groß
  2020-08-26 11:16 ` [patch V2 28/46] x86/xen: Consolidate XEN-MSI init Thomas Gleixner
                   ` (26 subsequent siblings)
  53 siblings, 1 reply; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

X86 cannot store the irq domain pointer in struct device without breaking
XEN because the irq domain pointer takes precedence over arch_*_msi_irqs()
fallbacks.

XENs MSI teardown relies on default_teardown_msi_irqs() which invokes
arch_teardown_msi_irq(). default_teardown_msi_irqs() is a trivial iterator
over the msi entries associated to a device.

Implement this loop in xen_teardown_msi_irqs() to prepare for removal of
the fallbacks for X86.

This is a preparatory step to wrap XEN MSI alloc/free into a irq domain
which in turn allows to store the irq domain pointer in struct device and
to use the irq domain functions directly.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Use teardown_pv... for initial domain (Juergen)
---
 arch/x86/pci/xen.c |   23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -377,20 +377,31 @@ static void xen_initdom_restore_msi_irqs
 static void xen_teardown_msi_irqs(struct pci_dev *dev)
 {
 	struct msi_desc *msidesc;
+	int i;
+
+	for_each_pci_msi_entry(msidesc, dev) {
+		if (msidesc->irq) {
+			for (i = 0; i < msidesc->nvec_used; i++)
+				xen_destroy_irq(msidesc->irq + i);
+		}
+	}
+}
+
+static void xen_pv_teardown_msi_irqs(struct pci_dev *dev)
+{
+	struct msi_desc *msidesc = first_pci_msi_entry(dev);
 
-	msidesc = first_pci_msi_entry(dev);
 	if (msidesc->msi_attrib.is_msix)
 		xen_pci_frontend_disable_msix(dev);
 	else
 		xen_pci_frontend_disable_msi(dev);
 
-	/* Free the IRQ's and the msidesc using the generic code. */
-	default_teardown_msi_irqs(dev);
+	xen_teardown_msi_irqs(dev);
 }
 
 static void xen_teardown_msi_irq(unsigned int irq)
 {
-	xen_destroy_irq(irq);
+	WARN_ON_ONCE(1);
 }
 
 #endif
@@ -413,7 +424,7 @@ int __init pci_xen_init(void)
 #ifdef CONFIG_PCI_MSI
 	x86_msi.setup_msi_irqs = xen_setup_msi_irqs;
 	x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
-	x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
+	x86_msi.teardown_msi_irqs = xen_pv_teardown_msi_irqs;
 	pci_msi_ignore_mask = 1;
 #endif
 	return 0;
@@ -437,6 +448,7 @@ static void __init xen_hvm_msi_init(void
 	}
 
 	x86_msi.setup_msi_irqs = xen_hvm_setup_msi_irqs;
+	x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
 	x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
 }
 #endif
@@ -473,6 +485,7 @@ int __init pci_xen_initial_domain(void)
 #ifdef CONFIG_PCI_MSI
 	x86_msi.setup_msi_irqs = xen_initdom_setup_msi_irqs;
 	x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
+	x86_msi.teardown_msi_irqs = xen_teardown_pv_msi_irqs;
 	x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs;
 	pci_msi_ignore_mask = 1;
 #endif


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 28/46] x86/xen: Consolidate XEN-MSI init
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (26 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 27/46] x86/xen: Rework MSI teardown Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-27  7:47   ` Jürgen Groß
  2020-08-26 11:16 ` [patch V2 29/46] irqdomain/msi: Allow to override msi_domain_alloc/free_irqs() Thomas Gleixner
                   ` (25 subsequent siblings)
  53 siblings, 1 reply; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

X86 cannot store the irq domain pointer in struct device without breaking
XEN because the irq domain pointer takes precedence over arch_*_msi_irqs()
fallbacks.

To achieve this XEN MSI interrupt management needs to be wrapped into an
irq domain.

Move the x86_msi ops setup into a single function to prepare for this.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Use the proper logic to select initial domain setup
---
 arch/x86/pci/xen.c |   51 ++++++++++++++++++++++++++++++++-------------------
 1 file changed, 32 insertions(+), 19 deletions(-)

--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -372,7 +372,10 @@ static void xen_initdom_restore_msi_irqs
 		WARN(ret && ret != -ENOSYS, "restore_msi -> %d\n", ret);
 	}
 }
-#endif
+#else /* CONFIG_XEN_DOM0 */
+#define xen_initdom_setup_msi_irqs	NULL
+#define xen_initdom_restore_msi_irqs	NULL
+#endif /* !CONFIG_XEN_DOM0 */
 
 static void xen_teardown_msi_irqs(struct pci_dev *dev)
 {
@@ -404,7 +407,31 @@ static void xen_teardown_msi_irq(unsigne
 	WARN_ON_ONCE(1);
 }
 
-#endif
+static __init void xen_setup_pci_msi(void)
+{
+	if (xen_pv_domain()) {
+		if (xen_initial_domain()) {
+			x86_msi.setup_msi_irqs = xen_initdom_setup_msi_irqs;
+			x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs;
+		} else {
+			x86_msi.setup_msi_irqs = xen_setup_msi_irqs;
+		}
+		x86_msi.teardown_msi_irqs = xen_pv_teardown_msi_irqs;
+		pci_msi_ignore_mask = 1;
+	} else if (xen_hvm_domain()) {
+		x86_msi.setup_msi_irqs = xen_hvm_setup_msi_irqs;
+		x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
+	} else {
+		WARN_ON_ONCE(1);
+		return;
+	}
+
+	x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
+}
+
+#else /* CONFIG_PCI_MSI */
+static inline void xen_setup_pci_msi(void) { }
+#endif /* CONFIG_PCI_MSI */
 
 int __init pci_xen_init(void)
 {
@@ -421,12 +448,7 @@ int __init pci_xen_init(void)
 	/* Keep ACPI out of the picture */
 	acpi_noirq_set();
 
-#ifdef CONFIG_PCI_MSI
-	x86_msi.setup_msi_irqs = xen_setup_msi_irqs;
-	x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
-	x86_msi.teardown_msi_irqs = xen_pv_teardown_msi_irqs;
-	pci_msi_ignore_mask = 1;
-#endif
+	xen_setup_pci_msi();
 	return 0;
 }
 
@@ -446,10 +468,7 @@ static void __init xen_hvm_msi_init(void
 		    ((eax & XEN_HVM_CPUID_APIC_ACCESS_VIRT) && boot_cpu_has(X86_FEATURE_APIC)))
 			return;
 	}
-
-	x86_msi.setup_msi_irqs = xen_hvm_setup_msi_irqs;
-	x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
-	x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
+	xen_setup_pci_msi();
 }
 #endif
 
@@ -482,13 +501,7 @@ int __init pci_xen_initial_domain(void)
 {
 	int irq;
 
-#ifdef CONFIG_PCI_MSI
-	x86_msi.setup_msi_irqs = xen_initdom_setup_msi_irqs;
-	x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
-	x86_msi.teardown_msi_irqs = xen_teardown_pv_msi_irqs;
-	x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs;
-	pci_msi_ignore_mask = 1;
-#endif
+	xen_setup_pci_msi();
 	__acpi_register_gsi = acpi_register_gsi_xen;
 	__acpi_unregister_gsi = NULL;
 	/*


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 29/46] irqdomain/msi: Allow to override msi_domain_alloc/free_irqs()
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (27 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 28/46] x86/xen: Consolidate XEN-MSI init Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-26 19:06   ` Marc Zyngier
  2020-08-28  0:24   ` Dey, Megha
  2020-08-26 11:16 ` [patch V2 30/46] x86/xen: Wrap XEN MSI management into irqdomain Thomas Gleixner
                   ` (24 subsequent siblings)
  53 siblings, 2 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

To support MSI irq domains which do not fit at all into the regular MSI
irqdomain scheme, like the XEN MSI interrupt management for PV/HVM/DOM0,
it's necessary to allow to override the alloc/free implementation.

This is a preperatory step to switch X86 away from arch_*_msi_irqs() and
store the irq domain pointer right in struct device.

No functional change for existing MSI irq domain users.

Aside of the evil XEN wrapper this is also useful for special MSI domains
which need to do extra alloc/free work before/after calling the generic
core function. Work like allocating/freeing MSI descriptors, MSI storage
space etc.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 include/linux/msi.h |   27 ++++++++++++++++++++
 kernel/irq/msi.c    |   70 +++++++++++++++++++++++++++++++++++-----------------
 2 files changed, 75 insertions(+), 22 deletions(-)

--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -241,6 +241,10 @@ struct msi_domain_info;
  * @msi_finish:		Optional callback to finalize the allocation
  * @set_desc:		Set the msi descriptor for an interrupt
  * @handle_error:	Optional error handler if the allocation fails
+ * @domain_alloc_irqs:	Optional function to override the default allocation
+ *			function.
+ * @domain_free_irqs:	Optional function to override the default free
+ *			function.
  *
  * @get_hwirq, @msi_init and @msi_free are callbacks used by
  * msi_create_irq_domain() and related interfaces
@@ -248,6 +252,22 @@ struct msi_domain_info;
  * @msi_check, @msi_prepare, @msi_finish, @set_desc and @handle_error
  * are callbacks used by msi_domain_alloc_irqs() and related
  * interfaces which are based on msi_desc.
+ *
+ * @domain_alloc_irqs, @domain_free_irqs can be used to override the
+ * default allocation/free functions (__msi_domain_alloc/free_irqs). This
+ * is initially for a wrapper around XENs seperate MSI universe which can't
+ * be wrapped into the regular irq domains concepts by mere mortals.  This
+ * allows to universally use msi_domain_alloc/free_irqs without having to
+ * special case XEN all over the place.
+ *
+ * Contrary to other operations @domain_alloc_irqs and @domain_free_irqs
+ * are set to the default implementation if NULL and even when
+ * MSI_FLAG_USE_DEF_DOM_OPS is not set to avoid breaking existing users and
+ * because these callbacks are obviously mandatory.
+ *
+ * This is NOT meant to be abused, but it can be useful to build wrappers
+ * for specialized MSI irq domains which need extra work before and after
+ * calling __msi_domain_alloc_irqs()/__msi_domain_free_irqs().
  */
 struct msi_domain_ops {
 	irq_hw_number_t	(*get_hwirq)(struct msi_domain_info *info,
@@ -270,6 +290,10 @@ struct msi_domain_ops {
 				    struct msi_desc *desc);
 	int		(*handle_error)(struct irq_domain *domain,
 					struct msi_desc *desc, int error);
+	int		(*domain_alloc_irqs)(struct irq_domain *domain,
+					     struct device *dev, int nvec);
+	void		(*domain_free_irqs)(struct irq_domain *domain,
+					    struct device *dev);
 };
 
 /**
@@ -327,8 +351,11 @@ int msi_domain_set_affinity(struct irq_d
 struct irq_domain *msi_create_irq_domain(struct fwnode_handle *fwnode,
 					 struct msi_domain_info *info,
 					 struct irq_domain *parent);
+int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
+			    int nvec);
 int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
 			  int nvec);
+void __msi_domain_free_irqs(struct irq_domain *domain, struct device *dev);
 void msi_domain_free_irqs(struct irq_domain *domain, struct device *dev);
 struct msi_domain_info *msi_get_domain_info(struct irq_domain *domain);
 
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -229,11 +229,13 @@ static int msi_domain_ops_check(struct i
 }
 
 static struct msi_domain_ops msi_domain_ops_default = {
-	.get_hwirq	= msi_domain_ops_get_hwirq,
-	.msi_init	= msi_domain_ops_init,
-	.msi_check	= msi_domain_ops_check,
-	.msi_prepare	= msi_domain_ops_prepare,
-	.set_desc	= msi_domain_ops_set_desc,
+	.get_hwirq		= msi_domain_ops_get_hwirq,
+	.msi_init		= msi_domain_ops_init,
+	.msi_check		= msi_domain_ops_check,
+	.msi_prepare		= msi_domain_ops_prepare,
+	.set_desc		= msi_domain_ops_set_desc,
+	.domain_alloc_irqs	= __msi_domain_alloc_irqs,
+	.domain_free_irqs	= __msi_domain_free_irqs,
 };
 
 static void msi_domain_update_dom_ops(struct msi_domain_info *info)
@@ -245,6 +247,14 @@ static void msi_domain_update_dom_ops(st
 		return;
 	}
 
+	if (ops->domain_alloc_irqs == NULL)
+		ops->domain_alloc_irqs = msi_domain_ops_default.domain_alloc_irqs;
+	if (ops->domain_free_irqs == NULL)
+		ops->domain_free_irqs = msi_domain_ops_default.domain_free_irqs;
+
+	if (!(info->flags & MSI_FLAG_USE_DEF_DOM_OPS))
+		return;
+
 	if (ops->get_hwirq == NULL)
 		ops->get_hwirq = msi_domain_ops_default.get_hwirq;
 	if (ops->msi_init == NULL)
@@ -278,8 +288,7 @@ struct irq_domain *msi_create_irq_domain
 {
 	struct irq_domain *domain;
 
-	if (info->flags & MSI_FLAG_USE_DEF_DOM_OPS)
-		msi_domain_update_dom_ops(info);
+	msi_domain_update_dom_ops(info);
 	if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS)
 		msi_domain_update_chip_ops(info);
 
@@ -386,17 +395,8 @@ static bool msi_check_reservation_mode(s
 	return desc->msi_attrib.is_msix || desc->msi_attrib.maskbit;
 }
 
-/**
- * msi_domain_alloc_irqs - Allocate interrupts from a MSI interrupt domain
- * @domain:	The domain to allocate from
- * @dev:	Pointer to device struct of the device for which the interrupts
- *		are allocated
- * @nvec:	The number of interrupts to allocate
- *
- * Returns 0 on success or an error code.
- */
-int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
-			  int nvec)
+int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
+			    int nvec)
 {
 	struct msi_domain_info *info = domain->host_data;
 	struct msi_domain_ops *ops = info->ops;
@@ -490,12 +490,24 @@ int msi_domain_alloc_irqs(struct irq_dom
 }
 
 /**
- * msi_domain_free_irqs - Free interrupts from a MSI interrupt @domain associated tp @dev
- * @domain:	The domain to managing the interrupts
+ * msi_domain_alloc_irqs - Allocate interrupts from a MSI interrupt domain
+ * @domain:	The domain to allocate from
  * @dev:	Pointer to device struct of the device for which the interrupts
- *		are free
+ *		are allocated
+ * @nvec:	The number of interrupts to allocate
+ *
+ * Returns 0 on success or an error code.
  */
-void msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
+int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
+			  int nvec)
+{
+	struct msi_domain_info *info = domain->host_data;
+	struct msi_domain_ops *ops = info->ops;
+
+	return ops->domain_alloc_irqs(domain, dev, nvec);
+}
+
+void __msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
 {
 	struct msi_desc *desc;
 
@@ -513,6 +525,20 @@ void msi_domain_free_irqs(struct irq_dom
 }
 
 /**
+ * __msi_domain_free_irqs - Free interrupts from a MSI interrupt @domain associated tp @dev
+ * @domain:	The domain to managing the interrupts
+ * @dev:	Pointer to device struct of the device for which the interrupts
+ *		are free
+ */
+void msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
+{
+	struct msi_domain_info *info = domain->host_data;
+	struct msi_domain_ops *ops = info->ops;
+
+	return ops->domain_free_irqs(domain, dev);
+}
+
+/**
  * msi_get_domain_info - Get the MSI interrupt domain info for @domain
  * @domain:	The interrupt domain to retrieve data from
  *



^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 30/46] x86/xen: Wrap XEN MSI management into irqdomain
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (28 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 29/46] irqdomain/msi: Allow to override msi_domain_alloc/free_irqs() Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2023-01-15 14:12   ` David Woodhouse
  2020-08-26 11:16 ` [patch V2 31/46] iommm/vt-d: Store irq domain in struct device Thomas Gleixner
                   ` (23 subsequent siblings)
  53 siblings, 1 reply; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

To allow utilizing the irq domain pointer in struct device it is necessary
to make XEN/MSI irq domain compatible.

While the right solution would be to truly convert XEN to irq domains, this
is an exercise which is not possible for mere mortals with limited XENology.

Provide a plain irqdomain wrapper around XEN. While this is blatant
violation of the irqdomain design, it's the only solution for a XEN igorant
person to make progress on the issue which triggered this change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Juergen Gross <jgross@suse.com>

---
Note: This is completely untested, but it compiles so it must be perfect.
---
 arch/x86/pci/xen.c |   63 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -407,6 +407,63 @@ static void xen_teardown_msi_irq(unsigne
 	WARN_ON_ONCE(1);
 }
 
+static int xen_msi_domain_alloc_irqs(struct irq_domain *domain,
+				     struct device *dev,  int nvec)
+{
+	int type;
+
+	if (WARN_ON_ONCE(!dev_is_pci(dev)))
+		return -EINVAL;
+
+	if (first_msi_entry(dev)->msi_attrib.is_msix)
+		type = PCI_CAP_ID_MSIX;
+	else
+		type = PCI_CAP_ID_MSI;
+
+	return x86_msi.setup_msi_irqs(to_pci_dev(dev), nvec, type);
+}
+
+static void xen_msi_domain_free_irqs(struct irq_domain *domain,
+				     struct device *dev)
+{
+	if (WARN_ON_ONCE(!dev_is_pci(dev)))
+		return;
+
+	x86_msi.teardown_msi_irqs(to_pci_dev(dev));
+}
+
+static struct msi_domain_ops xen_pci_msi_domain_ops = {
+	.domain_alloc_irqs	= xen_msi_domain_alloc_irqs,
+	.domain_free_irqs	= xen_msi_domain_free_irqs,
+};
+
+static struct msi_domain_info xen_pci_msi_domain_info = {
+	.ops			= &xen_pci_msi_domain_ops,
+};
+
+/*
+ * This irq domain is a blatant violation of the irq domain design, but
+ * distangling XEN into real irq domains is not a job for mere mortals with
+ * limited XENology. But it's the least dangerous way for a mere mortal to
+ * get rid of the arch_*_msi_irqs() hackery in order to store the irq
+ * domain pointer in struct device. This irq domain wrappery allows to do
+ * that without breaking XEN terminally.
+ */
+static __init struct irq_domain *xen_create_pci_msi_domain(void)
+{
+	struct irq_domain *d = NULL;
+	struct fwnode_handle *fn;
+
+	fn = irq_domain_alloc_named_fwnode("XEN-MSI");
+	if (fn)
+		d = msi_create_irq_domain(fn, &xen_pci_msi_domain_info, NULL);
+
+	/* FIXME: No idea how to survive if this fails */
+	BUG_ON(!d);
+
+	return d;
+}
+
 static __init void xen_setup_pci_msi(void)
 {
 	if (xen_pv_domain()) {
@@ -427,6 +484,12 @@ static __init void xen_setup_pci_msi(voi
 	}
 
 	x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
+
+	/*
+	 * Override the PCI/MSI irq domain init function. No point
+	 * in allocating the native domain and never use it.
+	 */
+	x86_init.irqs.create_pci_msi_domain = xen_create_pci_msi_domain;
 }
 
 #else /* CONFIG_PCI_MSI */


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 31/46] iommm/vt-d: Store irq domain in struct device
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (29 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 30/46] x86/xen: Wrap XEN MSI management into irqdomain Thomas Gleixner
@ 2020-08-26 11:16 ` Thomas Gleixner
  2020-08-26 11:17 ` [patch V2 32/46] iommm/amd: " Thomas Gleixner
                   ` (22 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:16 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

As a first step to make X86 utilize the direct MSI irq domain operations
store the irq domain pointer in the device struct when a device is probed.

This is done from dmar_pci_bus_add_dev() because it has to work even when
DMA remapping is disabled. It only overrides the irqdomain of devices which
are handled by a regular PCI/MSI irq domain which protects PCI devices
behind special busses like VMD which have their own irq domain.

No functional change. It just avoids the redirection through
arch_*_msi_irqs() and allows the PCI/MSI core to directly invoke the irq
domain alloc/free functions instead of having to look up the irq domain for
every single MSI interupt.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Add missing forward declaration
---
 drivers/iommu/intel/dmar.c          |    3 +++
 drivers/iommu/intel/irq_remapping.c |   16 ++++++++++++++++
 include/linux/intel-iommu.h         |    7 +++++++
 3 files changed, 26 insertions(+)

--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -316,6 +316,9 @@ static int dmar_pci_bus_add_dev(struct d
 	if (ret < 0 && dmar_dev_scope_status == 0)
 		dmar_dev_scope_status = ret;
 
+	if (ret >= 0)
+		intel_irq_remap_add_device(info);
+
 	return ret;
 }
 
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1086,6 +1086,22 @@ static int reenable_irq_remapping(int ei
 	return -1;
 }
 
+/*
+ * Store the MSI remapping domain pointer in the device if enabled.
+ *
+ * This is called from dmar_pci_bus_add_dev() so it works even when DMA
+ * remapping is disabled. Only update the pointer if the device is not
+ * already handled by a non default PCI/MSI interrupt domain. This protects
+ * e.g. VMD devices.
+ */
+void intel_irq_remap_add_device(struct dmar_pci_notify_info *info)
+{
+	if (!irq_remapping_enabled || pci_dev_has_special_msi_domain(info->dev))
+		return;
+
+	dev_set_msi_domain(&info->dev->dev, map_dev_to_ir(info->dev));
+}
+
 static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
 {
 	memset(irte, 0, sizeof(*irte));
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -425,6 +425,8 @@ struct q_inval {
 	int             free_cnt;
 };
 
+struct dmar_pci_notify_info;
+
 #ifdef CONFIG_IRQ_REMAP
 /* 1MB - maximum possible interrupt remapping table size */
 #define INTR_REMAP_PAGE_ORDER	8
@@ -439,6 +441,11 @@ struct ir_table {
 	struct irte *base;
 	unsigned long *bitmap;
 };
+
+void intel_irq_remap_add_device(struct dmar_pci_notify_info *info);
+#else
+static inline void
+intel_irq_remap_add_device(struct dmar_pci_notify_info *info) { }
 #endif
 
 struct iommu_flush {


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 32/46] iommm/amd: Store irq domain in struct device
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (30 preceding siblings ...)
  2020-08-26 11:16 ` [patch V2 31/46] iommm/vt-d: Store irq domain in struct device Thomas Gleixner
@ 2020-08-26 11:17 ` Thomas Gleixner
  2020-08-26 11:17 ` [patch V2 33/46] x86/pci: Set default irq domain in pcibios_add_device() Thomas Gleixner
                   ` (21 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

As the next step to make X86 utilize the direct MSI irq domain operations
store the irq domain pointer in the device struct when a device is probed.

It only overrides the irqdomain of devices which are handled by a regular
PCI/MSI irq domain which protects PCI devices behind special busses like
VMD which have their own irq domain.

No functional change.

It just avoids the redirection through arch_*_msi_irqs() and allows the
PCI/MSI core to directly invoke the irq domain alloc/free functions instead
of having to look up the irq domain for every single MSI interupt.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 drivers/iommu/amd/iommu.c |   17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -729,7 +729,21 @@ static void iommu_poll_ga_log(struct amd
 		}
 	}
 }
-#endif /* CONFIG_IRQ_REMAP */
+
+static void
+amd_iommu_set_pci_msi_domain(struct device *dev, struct amd_iommu *iommu)
+{
+	if (!irq_remapping_enabled || !dev_is_pci(dev) ||
+	    pci_dev_has_special_msi_domain(to_pci_dev(dev)))
+		return;
+
+	dev_set_msi_domain(dev, iommu->msi_domain);
+}
+
+#else /* CONFIG_IRQ_REMAP */
+static inline void
+amd_iommu_set_pci_msi_domain(struct device *dev, struct amd_iommu *iommu) { }
+#endif /* !CONFIG_IRQ_REMAP */
 
 #define AMD_IOMMU_INT_MASK	\
 	(MMIO_STATUS_EVT_INT_MASK | \
@@ -2157,6 +2171,7 @@ static struct iommu_device *amd_iommu_pr
 		iommu_dev = ERR_PTR(ret);
 		iommu_ignore_device(dev);
 	} else {
+		amd_iommu_set_pci_msi_domain(dev, iommu);
 		iommu_dev = &iommu->iommu;
 	}
 



^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 33/46] x86/pci: Set default irq domain in pcibios_add_device()
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (31 preceding siblings ...)
  2020-08-26 11:17 ` [patch V2 32/46] iommm/amd: " Thomas Gleixner
@ 2020-08-26 11:17 ` Thomas Gleixner
  2020-08-26 11:17 ` [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable Thomas Gleixner
                   ` (20 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

Now that interrupt remapping sets the irqdomain pointer when a PCI device
is added it's possible to store the default irq domain in the device struct
in pcibios_add_device().

If the bus to which a device is connected has an irq domain associated then
this domain is used otherwise the default domain (PCI/MSI native or XEN
PCI/MSI) is used. Using the bus domain ensures that special MSI bus domains
like VMD work.

This makes XEN and the non-remapped native case work solely based on the
irq domain pointer in struct device for PCI/MSI and allows to remove the
arch fallback and make most of the x86_msi ops private to XEN in the next
steps.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/irqdomain.h |    2 ++
 arch/x86/kernel/apic/msi.c       |    2 +-
 arch/x86/pci/common.c            |   18 +++++++++++++++++-
 3 files changed, 20 insertions(+), 2 deletions(-)

--- a/arch/x86/include/asm/irqdomain.h
+++ b/arch/x86/include/asm/irqdomain.h
@@ -53,9 +53,11 @@ extern int mp_irqdomain_ioapic_idx(struc
 #ifdef CONFIG_PCI_MSI
 void x86_create_pci_msi_domain(void);
 struct irq_domain *native_create_pci_msi_domain(void);
+extern struct irq_domain *x86_pci_msi_default_domain;
 #else
 static inline void x86_create_pci_msi_domain(void) { }
 #define native_create_pci_msi_domain	NULL
+#define x86_pci_msi_default_domain	NULL
 #endif
 
 #endif
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -21,7 +21,7 @@
 #include <asm/apic.h>
 #include <asm/irq_remapping.h>
 
-static struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
+struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
 
 static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg)
 {
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -19,6 +19,7 @@
 #include <asm/smp.h>
 #include <asm/pci_x86.h>
 #include <asm/setup.h>
+#include <asm/irqdomain.h>
 
 unsigned int pci_probe = PCI_PROBE_BIOS | PCI_PROBE_CONF1 | PCI_PROBE_CONF2 |
 				PCI_PROBE_MMCONF;
@@ -633,8 +634,9 @@ static void set_dev_domain_options(struc
 
 int pcibios_add_device(struct pci_dev *dev)
 {
-	struct setup_data *data;
 	struct pci_setup_rom *rom;
+	struct irq_domain *msidom;
+	struct setup_data *data;
 	u64 pa_data;
 
 	pa_data = boot_params.hdr.setup_data;
@@ -661,6 +663,20 @@ int pcibios_add_device(struct pci_dev *d
 		memunmap(data);
 	}
 	set_dev_domain_options(dev);
+
+	/*
+	 * Setup the initial MSI domain of the device. If the underlying
+	 * bus has a PCI/MSI irqdomain associated use the bus domain,
+	 * otherwise set the default domain. This ensures that special irq
+	 * domains e.g. VMD are preserved. The default ensures initial
+	 * operation if irq remapping is not active. If irq remapping is
+	 * active it will overwrite the domain pointer when the device is
+	 * associated to a remapping domain.
+	 */
+	msidom = dev_get_msi_domain(&dev->bus->dev);
+	if (!msidom)
+		msidom = x86_pci_msi_default_domain;
+	dev_set_msi_domain(&dev->dev, msidom);
 	return 0;
 }
 



^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (32 preceding siblings ...)
  2020-08-26 11:17 ` [patch V2 33/46] x86/pci: Set default irq domain in pcibios_add_device() Thomas Gleixner
@ 2020-08-26 11:17 ` Thomas Gleixner
  2020-08-26 15:53   ` Thomas Gleixner
                     ` (3 more replies)
  2020-08-26 11:17 ` [patch V2 35/46] x86/irq: Cleanup the arch_*_msi_irqs() leftovers Thomas Gleixner
                   ` (19 subsequent siblings)
  53 siblings, 4 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

The arch_.*_msi_irq[s] fallbacks are compiled in whether an architecture
requires them or not. Architectures which are fully utilizing hierarchical
irq domains should never call into that code.

It's not only architectures which depend on that by implementing one or
more of the weak functions, there is also a bunch of drivers which relies
on the weak functions which invoke msi_controller::setup_irq[s] and
msi_controller::teardown_irq.

Make the architectures and drivers which rely on them select them in Kconfig
and if not selected replace them by stub functions which emit a warning and
fail the PCI/MSI interrupt allocation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Make the architectures (and drivers) which need the fallbacks select them
    and not the other way round (Bjorn).
---
 arch/ia64/Kconfig              |    1 +
 arch/mips/Kconfig              |    1 +
 arch/powerpc/Kconfig           |    1 +
 arch/s390/Kconfig              |    1 +
 arch/sparc/Kconfig             |    1 +
 arch/x86/Kconfig               |    1 +
 drivers/pci/Kconfig            |    3 +++
 drivers/pci/controller/Kconfig |    3 +++
 drivers/pci/msi.c              |    3 ++-
 include/linux/msi.h            |   31 ++++++++++++++++++++++++++-----
 10 files changed, 40 insertions(+), 6 deletions(-)

--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -56,6 +56,7 @@ config IA64
 	select NEED_DMA_MAP_STATE
 	select NEED_SG_DMA_LENGTH
 	select NUMA if !FLATMEM
+	select PCI_MSI_ARCH_FALLBACKS
 	default y
 	help
 	  The Itanium Processor Family is Intel's 64-bit successor to
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -86,6 +86,7 @@ config MIPS
 	select MODULES_USE_ELF_REL if MODULES
 	select MODULES_USE_ELF_RELA if MODULES && 64BIT
 	select PERF_USE_VMALLOC
+	select PCI_MSI_ARCH_FALLBACKS
 	select RTC_LIB
 	select SYSCTL_EXCEPTION_TRACE
 	select VIRT_TO_BUS
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -246,6 +246,7 @@ config PPC
 	select OLD_SIGACTION			if PPC32
 	select OLD_SIGSUSPEND
 	select PCI_DOMAINS			if PCI
+	select PCI_MSI_ARCH_FALLBACKS
 	select PCI_SYSCALL			if PCI
 	select PPC_DAWR				if PPC64
 	select RTC_LIB
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -185,6 +185,7 @@ config S390
 	select OLD_SIGSUSPEND3
 	select PCI_DOMAINS		if PCI
 	select PCI_MSI			if PCI
+	select PCI_MSI_ARCH_FALLBACKS
 	select SPARSE_IRQ
 	select SYSCTL_EXCEPTION_TRACE
 	select THREAD_INFO_IN_TASK
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -43,6 +43,7 @@ config SPARC
 	select GENERIC_STRNLEN_USER
 	select MODULES_USE_ELF_RELA
 	select PCI_SYSCALL if PCI
+	select PCI_MSI_ARCH_FALLBACKS
 	select ODD_RT_SIGACTION
 	select OLD_SIGSUSPEND
 	select CPU_NO_EFFICIENT_FFS
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -225,6 +225,7 @@ config X86
 	select NEED_SG_DMA_LENGTH
 	select PCI_DOMAINS			if PCI
 	select PCI_LOCKLESS_CONFIG		if PCI
+	select PCI_MSI_ARCH_FALLBACKS
 	select PERF_EVENTS
 	select RTC_LIB
 	select RTC_MC146818_LIB
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -56,6 +56,9 @@ config PCI_MSI_IRQ_DOMAIN
 	depends on PCI_MSI
 	select GENERIC_MSI_IRQ_DOMAIN
 
+config PCI_MSI_ARCH_FALLBACKS
+	bool
+
 config PCI_QUIRKS
 	default y
 	bool "Enable PCI quirk workarounds" if EXPERT
--- a/drivers/pci/controller/Kconfig
+++ b/drivers/pci/controller/Kconfig
@@ -41,6 +41,7 @@ config PCI_TEGRA
 	bool "NVIDIA Tegra PCIe controller"
 	depends on ARCH_TEGRA || COMPILE_TEST
 	depends on PCI_MSI_IRQ_DOMAIN
+	select PCI_MSI_ARCH_FALLBACKS
 	help
 	  Say Y here if you want support for the PCIe host controller found
 	  on NVIDIA Tegra SoCs.
@@ -67,6 +68,7 @@ config PCIE_RCAR_HOST
 	bool "Renesas R-Car PCIe host controller"
 	depends on ARCH_RENESAS || COMPILE_TEST
 	depends on PCI_MSI_IRQ_DOMAIN
+	select PCI_MSI_ARCH_FALLBACKS
 	help
 	  Say Y here if you want PCIe controller support on R-Car SoCs in host
 	  mode.
@@ -103,6 +105,7 @@ config PCIE_XILINX_CPM
 	bool "Xilinx Versal CPM host bridge support"
 	depends on ARCH_ZYNQMP || COMPILE_TEST
 	select PCI_HOST_COMMON
+	select PCI_MSI_ARCH_FALLBACKS
 	help
 	  Say 'Y' here if you want kernel support for the
 	  Xilinx Versal CPM host bridge.
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -58,8 +58,8 @@ static void pci_msi_teardown_msi_irqs(st
 #define pci_msi_teardown_msi_irqs	arch_teardown_msi_irqs
 #endif
 
+#ifdef CONFIG_PCI_MSI_ARCH_FALLBACKS
 /* Arch hooks */
-
 int __weak arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc)
 {
 	struct msi_controller *chip = dev->bus->msi;
@@ -132,6 +132,7 @@ void __weak arch_teardown_msi_irqs(struc
 {
 	return default_teardown_msi_irqs(dev);
 }
+#endif /* CONFIG_PCI_MSI_ARCH_FALLBACKS */
 
 static void default_restore_msi_irq(struct pci_dev *dev, int irq)
 {
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -193,17 +193,38 @@ void pci_msi_mask_irq(struct irq_data *d
 void pci_msi_unmask_irq(struct irq_data *data);
 
 /*
- * The arch hooks to setup up msi irqs. Those functions are
- * implemented as weak symbols so that they /can/ be overriden by
- * architecture specific code if needed.
+ * The arch hooks to setup up msi irqs. Default functions are implemented
+ * as weak symbols so that they /can/ be overriden by architecture specific
+ * code if needed. These hooks must be enabled by the architecture or by
+ * drivers which depend on them via msi_controller based MSI handling.
+ *
+ * If CONFIG_PCI_MSI_ARCH_FALLBACKS is not selected they are replaced by
+ * stubs with warnings.
  */
+#ifdef CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS
 int arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc);
 void arch_teardown_msi_irq(unsigned int irq);
 int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
 void arch_teardown_msi_irqs(struct pci_dev *dev);
-void arch_restore_msi_irqs(struct pci_dev *dev);
-
 void default_teardown_msi_irqs(struct pci_dev *dev);
+#else
+static inline int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
+{
+	WARN_ON_ONCE(1);
+	return -ENODEV;
+}
+
+static inline void arch_teardown_msi_irqs(struct pci_dev *dev)
+{
+	WARN_ON_ONCE(1);
+}
+#endif
+
+/*
+ * The restore hooks are still available as they are useful even
+ * for fully irq domain based setups. Courtesy to XEN/X86.
+ */
+void arch_restore_msi_irqs(struct pci_dev *dev);
 void default_restore_msi_irqs(struct pci_dev *dev);
 
 struct msi_controller {


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 35/46] x86/irq: Cleanup the arch_*_msi_irqs() leftovers
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (33 preceding siblings ...)
  2020-08-26 11:17 ` [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable Thomas Gleixner
@ 2020-08-26 11:17 ` Thomas Gleixner
  2020-08-26 11:17 ` [patch V2 36/46] x86/irq: Make most MSI ops XEN private Thomas Gleixner
                   ` (18 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

Get rid of all the gunk and remove the 'select PCI_MSI_ARCH_FALLBACK' from
the x86 Kconfig so the weak functions in the PCI core are replaced by stubs
which emit a warning, which ensures that any fail to set the irq domain
pointer results in a warning when the device is used.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Adjust to the PCI_MSI_ARCH_FALLBACK change, i.e. remove it instead
    of selecting the disabler.
---
 arch/x86/Kconfig                |    1 -
 arch/x86/include/asm/pci.h      |   11 -----------
 arch/x86/include/asm/x86_init.h |    1 -
 arch/x86/kernel/apic/msi.c      |   22 ----------------------
 arch/x86/kernel/x86_init.c      |   18 ------------------
 arch/x86/pci/xen.c              |    7 -------
 6 files changed, 60 deletions(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -225,7 +225,6 @@ config X86
 	select NEED_SG_DMA_LENGTH
 	select PCI_DOMAINS			if PCI
 	select PCI_LOCKLESS_CONFIG		if PCI
-	select PCI_MSI_ARCH_FALLBACKS
 	select PERF_EVENTS
 	select RTC_LIB
 	select RTC_MC146818_LIB
--- a/arch/x86/include/asm/pci.h
+++ b/arch/x86/include/asm/pci.h
@@ -105,17 +105,6 @@ static inline void early_quirks(void) {
 
 extern void pci_iommu_alloc(void);
 
-#ifdef CONFIG_PCI_MSI
-/* implemented in arch/x86/kernel/apic/io_apic. */
-struct msi_desc;
-int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
-void native_teardown_msi_irq(unsigned int irq);
-void native_restore_msi_irqs(struct pci_dev *dev);
-#else
-#define native_setup_msi_irqs		NULL
-#define native_teardown_msi_irq		NULL
-#endif
-
 /* generic pci stuff */
 #include <asm-generic/pci.h>
 
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -277,7 +277,6 @@ struct pci_dev;
 
 struct x86_msi_ops {
 	int (*setup_msi_irqs)(struct pci_dev *dev, int nvec, int type);
-	void (*teardown_msi_irq)(unsigned int irq);
 	void (*teardown_msi_irqs)(struct pci_dev *dev);
 	void (*restore_msi_irqs)(struct pci_dev *dev);
 };
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -182,28 +182,6 @@ static struct irq_chip pci_msi_controlle
 	.flags			= IRQCHIP_SKIP_SET_WAKE,
 };
 
-int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
-{
-	struct irq_domain *domain;
-	struct irq_alloc_info info;
-
-	init_irq_alloc_info(&info, NULL);
-	info.type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
-
-	domain = irq_remapping_get_irq_domain(&info);
-	if (domain == NULL)
-		domain = x86_pci_msi_default_domain;
-	if (domain == NULL)
-		return -ENOSYS;
-
-	return msi_domain_alloc_irqs(domain, &dev->dev, nvec);
-}
-
-void native_teardown_msi_irq(unsigned int irq)
-{
-	irq_domain_free_irqs(irq, 1);
-}
-
 int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
 		    msi_alloc_info_t *arg)
 {
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -146,28 +146,10 @@ EXPORT_SYMBOL_GPL(x86_platform);
 
 #if defined(CONFIG_PCI_MSI)
 struct x86_msi_ops x86_msi __ro_after_init = {
-	.setup_msi_irqs		= native_setup_msi_irqs,
-	.teardown_msi_irq	= native_teardown_msi_irq,
-	.teardown_msi_irqs	= default_teardown_msi_irqs,
 	.restore_msi_irqs	= default_restore_msi_irqs,
 };
 
 /* MSI arch specific hooks */
-int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
-{
-	return x86_msi.setup_msi_irqs(dev, nvec, type);
-}
-
-void arch_teardown_msi_irqs(struct pci_dev *dev)
-{
-	x86_msi.teardown_msi_irqs(dev);
-}
-
-void arch_teardown_msi_irq(unsigned int irq)
-{
-	x86_msi.teardown_msi_irq(irq);
-}
-
 void arch_restore_msi_irqs(struct pci_dev *dev)
 {
 	x86_msi.restore_msi_irqs(dev);
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -402,11 +402,6 @@ static void xen_pv_teardown_msi_irqs(str
 	xen_teardown_msi_irqs(dev);
 }
 
-static void xen_teardown_msi_irq(unsigned int irq)
-{
-	WARN_ON_ONCE(1);
-}
-
 static int xen_msi_domain_alloc_irqs(struct irq_domain *domain,
 				     struct device *dev,  int nvec)
 {
@@ -483,8 +478,6 @@ static __init void xen_setup_pci_msi(voi
 		return;
 	}
 
-	x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
-
 	/*
 	 * Override the PCI/MSI irq domain init function. No point
 	 * in allocating the native domain and never use it.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 36/46] x86/irq: Make most MSI ops XEN private
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (34 preceding siblings ...)
  2020-08-26 11:17 ` [patch V2 35/46] x86/irq: Cleanup the arch_*_msi_irqs() leftovers Thomas Gleixner
@ 2020-08-26 11:17 ` Thomas Gleixner
  2020-08-26 11:17 ` [patch V2 37/46] iommu/vt-d: Remove domain search for PCI/MSI[X] Thomas Gleixner
                   ` (17 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

Nothing except XEN uses the setup/teardown ops. Hide them there.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/x86_init.h |    2 --
 arch/x86/pci/xen.c              |   21 ++++++++++++++-------
 2 files changed, 14 insertions(+), 9 deletions(-)

--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -276,8 +276,6 @@ struct x86_platform_ops {
 struct pci_dev;
 
 struct x86_msi_ops {
-	int (*setup_msi_irqs)(struct pci_dev *dev, int nvec, int type);
-	void (*teardown_msi_irqs)(struct pci_dev *dev);
 	void (*restore_msi_irqs)(struct pci_dev *dev);
 };
 
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -157,6 +157,13 @@ static int acpi_register_gsi_xen(struct
 struct xen_pci_frontend_ops *xen_pci_frontend;
 EXPORT_SYMBOL_GPL(xen_pci_frontend);
 
+struct xen_msi_ops {
+	int (*setup_msi_irqs)(struct pci_dev *dev, int nvec, int type);
+	void (*teardown_msi_irqs)(struct pci_dev *dev);
+};
+
+static struct xen_msi_ops xen_msi_ops __ro_after_init;
+
 static int xen_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 {
 	int irq, ret, i;
@@ -415,7 +422,7 @@ static int xen_msi_domain_alloc_irqs(str
 	else
 		type = PCI_CAP_ID_MSI;
 
-	return x86_msi.setup_msi_irqs(to_pci_dev(dev), nvec, type);
+	return xen_msi_ops.setup_msi_irqs(to_pci_dev(dev), nvec, type);
 }
 
 static void xen_msi_domain_free_irqs(struct irq_domain *domain,
@@ -424,7 +431,7 @@ static void xen_msi_domain_free_irqs(str
 	if (WARN_ON_ONCE(!dev_is_pci(dev)))
 		return;
 
-	x86_msi.teardown_msi_irqs(to_pci_dev(dev));
+	xen_msi_ops.teardown_msi_irqs(to_pci_dev(dev));
 }
 
 static struct msi_domain_ops xen_pci_msi_domain_ops = {
@@ -463,16 +470,16 @@ static __init void xen_setup_pci_msi(voi
 {
 	if (xen_pv_domain()) {
 		if (xen_initial_domain()) {
-			x86_msi.setup_msi_irqs = xen_initdom_setup_msi_irqs;
+			xen_msi_ops.setup_msi_irqs = xen_initdom_setup_msi_irqs;
 			x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs;
 		} else {
-			x86_msi.setup_msi_irqs = xen_setup_msi_irqs;
+			xen_msi_ops.setup_msi_irqs = xen_setup_msi_irqs;
 		}
-		x86_msi.teardown_msi_irqs = xen_pv_teardown_msi_irqs;
+		xen_msi_ops.teardown_msi_irqs = xen_pv_teardown_msi_irqs;
 		pci_msi_ignore_mask = 1;
 	} else if (xen_hvm_domain()) {
-		x86_msi.setup_msi_irqs = xen_hvm_setup_msi_irqs;
-		x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
+		xen_msi_ops.setup_msi_irqs = xen_hvm_setup_msi_irqs;
+		xen_msi_ops.teardown_msi_irqs = xen_teardown_msi_irqs;
 	} else {
 		WARN_ON_ONCE(1);
 		return;


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 37/46] iommu/vt-d: Remove domain search for PCI/MSI[X]
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (35 preceding siblings ...)
  2020-08-26 11:17 ` [patch V2 36/46] x86/irq: Make most MSI ops XEN private Thomas Gleixner
@ 2020-08-26 11:17 ` Thomas Gleixner
  2020-08-26 11:17 ` [patch V2 38/46] iommu/amd: Remove domain search for PCI/MSI Thomas Gleixner
                   ` (16 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

Now that the domain can be retrieved through device::msi_domain the domain
search for PCI_MSI[X] is not longer required. Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch
---
 drivers/iommu/intel/irq_remapping.c |    3 ---
 1 file changed, 3 deletions(-)

--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1132,9 +1132,6 @@ static struct irq_domain *intel_get_irq_
 		return map_ioapic_to_ir(info->devid);
 	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
 		return map_hpet_to_ir(info->devid);
-	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
-	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-		return map_dev_to_ir(msi_desc_to_pci_dev(info->desc));
 	default:
 		WARN_ON_ONCE(1);
 		return NULL;


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 38/46] iommu/amd: Remove domain search for PCI/MSI
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (36 preceding siblings ...)
  2020-08-26 11:17 ` [patch V2 37/46] iommu/vt-d: Remove domain search for PCI/MSI[X] Thomas Gleixner
@ 2020-08-26 11:17 ` Thomas Gleixner
  2020-08-26 11:17 ` [patch V2 39/46] x86/irq: Add DEV_MSI allocation type Thomas Gleixner
                   ` (15 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

Now that the domain can be retrieved through device::msi_domain the domain
search for PCI_MSI[X] is not longer required. Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch
---
 drivers/iommu/amd/iommu.c |    3 ---
 1 file changed, 3 deletions(-)

--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3548,9 +3548,6 @@ static struct irq_domain *get_irq_domain
 	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
 	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
 		return iommu->ir_domain;
-	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
-	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-		return iommu->msi_domain;
 	default:
 		WARN_ON_ONCE(1);
 		return NULL;


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 39/46] x86/irq: Add DEV_MSI allocation type
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (37 preceding siblings ...)
  2020-08-26 11:17 ` [patch V2 38/46] iommu/amd: Remove domain search for PCI/MSI Thomas Gleixner
@ 2020-08-26 11:17 ` Thomas Gleixner
  2020-08-26 11:17 ` [patch V2 40/46] x86/msi: Rename and rework pci_msi_prepare() to cover non-PCI MSI Thomas Gleixner
                   ` (14 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

For the upcoming device MSI support a new allocation type is
required.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/hw_irq.h |    1 +
 1 file changed, 1 insertion(+)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -40,6 +40,7 @@ enum irq_alloc_type {
 	X86_IRQ_ALLOC_TYPE_PCI_MSIX,
 	X86_IRQ_ALLOC_TYPE_DMAR,
 	X86_IRQ_ALLOC_TYPE_UV,
+	X86_IRQ_ALLOC_TYPE_DEV_MSI,
 	X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT,
 	X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT,
 };



^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 40/46] x86/msi: Rename and rework pci_msi_prepare() to cover non-PCI MSI
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (38 preceding siblings ...)
  2020-08-26 11:17 ` [patch V2 39/46] x86/irq: Add DEV_MSI allocation type Thomas Gleixner
@ 2020-08-26 11:17 ` Thomas Gleixner
  2020-08-26 11:17 ` [patch V2 41/46] platform-msi: Provide default irq_chip:: Ack Thomas Gleixner
                   ` (13 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

Rename it to x86_msi_prepare() and handle the allocation type setup
depending on the device type.

Add a new arch_msi_prepare define which will be utilized by the upcoming
device MSI support. Define it to NULL if not provided by an architecture in
the generic MSI header.

One arch specific function for MSI support is truly enough.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Polish subject line
---
 arch/x86/include/asm/msi.h          |    4 +++-
 arch/x86/kernel/apic/msi.c          |   27 ++++++++++++++++++++-------
 drivers/pci/controller/pci-hyperv.c |    2 +-
 include/linux/msi.h                 |    4 ++++
 4 files changed, 28 insertions(+), 9 deletions(-)

--- a/arch/x86/include/asm/msi.h
+++ b/arch/x86/include/asm/msi.h
@@ -6,7 +6,9 @@
 
 typedef struct irq_alloc_info msi_alloc_info_t;
 
-int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
+int x86_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
 		    msi_alloc_info_t *arg);
 
+#define arch_msi_prepare		x86_msi_prepare
+
 #endif /* _ASM_X86_MSI_H */
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -181,26 +181,39 @@ static struct irq_chip pci_msi_controlle
 	.flags			= IRQCHIP_SKIP_SET_WAKE,
 };
 
-int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
-		    msi_alloc_info_t *arg)
+static void pci_msi_prepare(struct device *dev, msi_alloc_info_t *arg)
 {
-	struct pci_dev *pdev = to_pci_dev(dev);
-	struct msi_desc *desc = first_pci_msi_entry(pdev);
+	struct msi_desc *desc = first_msi_entry(dev);
 
-	init_irq_alloc_info(arg, NULL);
 	if (desc->msi_attrib.is_msix) {
 		arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX;
 	} else {
 		arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
 		arg->flags |= X86_IRQ_ALLOC_CONTIGUOUS_VECTORS;
 	}
+}
+
+static void dev_msi_prepare(struct device *dev, msi_alloc_info_t *arg)
+{
+	arg->type = X86_IRQ_ALLOC_TYPE_DEV_MSI;
+}
+
+int x86_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
+		    msi_alloc_info_t *arg)
+{
+	init_irq_alloc_info(arg, NULL);
+
+	if (dev_is_pci(dev))
+		pci_msi_prepare(dev, arg);
+	else
+		dev_msi_prepare(dev, arg);
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(pci_msi_prepare);
+EXPORT_SYMBOL_GPL(x86_msi_prepare);
 
 static struct msi_domain_ops pci_msi_domain_ops = {
-	.msi_prepare	= pci_msi_prepare,
+	.msi_prepare	= x86_msi_prepare,
 };
 
 static struct msi_domain_info pci_msi_domain_info = {
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -1532,7 +1532,7 @@ static struct irq_chip hv_msi_irq_chip =
 };
 
 static struct msi_domain_ops hv_msi_ops = {
-	.msi_prepare	= pci_msi_prepare,
+	.msi_prepare	= arch_msi_prepare,
 	.msi_free	= hv_msi_free,
 };
 
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -430,4 +430,8 @@ static inline struct irq_domain *pci_msi
 }
 #endif /* CONFIG_PCI_MSI_IRQ_DOMAIN */
 
+#ifndef arch_msi_prepare
+# define arch_msi_prepare	NULL
+#endif
+
 #endif /* LINUX_MSI_H */


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 41/46] platform-msi: Provide default irq_chip:: Ack
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (39 preceding siblings ...)
  2020-08-26 11:17 ` [patch V2 40/46] x86/msi: Rename and rework pci_msi_prepare() to cover non-PCI MSI Thomas Gleixner
@ 2020-08-26 11:17 ` Thomas Gleixner
  2020-08-26 21:25   ` Marc Zyngier
  2020-08-26 11:17 ` [patch V2 42/46] genirq/proc: Take buslock on affinity write Thomas Gleixner
                   ` (12 subsequent siblings)
  53 siblings, 1 reply; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

For the upcoming device MSI support it's required to have a default
irq_chip::ack implementation (irq_chip_ack_parent) so the drivers do not
need to care.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 drivers/base/platform-msi.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -95,6 +95,8 @@ static void platform_msi_update_chip_ops
 		chip->irq_mask = irq_chip_mask_parent;
 	if (!chip->irq_unmask)
 		chip->irq_unmask = irq_chip_unmask_parent;
+	if (!chip->irq_ack)
+		chip->irq_ack = irq_chip_ack_parent;
 	if (!chip->irq_eoi)
 		chip->irq_eoi = irq_chip_eoi_parent;
 	if (!chip->irq_set_affinity)


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 42/46] genirq/proc: Take buslock on affinity write
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (40 preceding siblings ...)
  2020-08-26 11:17 ` [patch V2 41/46] platform-msi: Provide default irq_chip:: Ack Thomas Gleixner
@ 2020-08-26 11:17 ` Thomas Gleixner
  2020-08-26 11:17 ` [patch V2 43/46] genirq/msi: Provide and use msi_domain_set_default_info_flags() Thomas Gleixner
                   ` (11 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

Until now interrupt chips which support setting affinity are nit locking
the associated bus lock for two reasons:

 - All chips which support affinity setting do not use buslock because they just
   can operated directly on the hardware.

 - All chips which use buslock do not support affinity setting because
   their interrupt chips are not capable. These chips are usually connected
   over a bus like I2C, SPI etc. and have an interrupt output which is
   conneted to CPU interrupt of some sort. So there is no way to set the
   affinity on the chip itself.

Upcoming hardware which is PCIE based sports a non standard MSI(X) variant
which stores the MSI message in RAM which is associated to e.g. a device
queue. The device manages this RAM and writes have to be issued via command
queues or similar mechanisms which is obviously not possible from interrupt
disabled, raw spinlock held context.

The buslock mechanism of irq chips can be utilized to support that. The
affinity write to the chip writes to shadow state, marks it pending and the
irq chip's irq_bus_sync_unlock() callback handles the command queue and
wait for completion similar to the other chip operations on I2C or SPI
busses.

Change the locking in irq_set_affinity() to bus_lock/unlock to help with
that. There are a few other callers than the proc interface, but none of
them is affected by this change as none of them affects an irq chip with
bus lock support.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch
---
 kernel/irq/manage.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -373,16 +373,16 @@ int irq_set_affinity_locked(struct irq_d
 
 int __irq_set_affinity(unsigned int irq, const struct cpumask *mask, bool force)
 {
-	struct irq_desc *desc = irq_to_desc(irq);
+	struct irq_desc *desc;
 	unsigned long flags;
 	int ret;
 
+	desc = irq_get_desc_buslock(irq, &flags, IRQ_GET_DESC_CHECK_GLOBAL);
 	if (!desc)
 		return -EINVAL;
 
-	raw_spin_lock_irqsave(&desc->lock, flags);
 	ret = irq_set_affinity_locked(irq_desc_get_irq_data(desc), mask, force);
-	raw_spin_unlock_irqrestore(&desc->lock, flags);
+	irq_put_desc_busunlock(desc, flags);
 	return ret;
 }
 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 43/46] genirq/msi: Provide and use msi_domain_set_default_info_flags()
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (41 preceding siblings ...)
  2020-08-26 11:17 ` [patch V2 42/46] genirq/proc: Take buslock on affinity write Thomas Gleixner
@ 2020-08-26 11:17 ` Thomas Gleixner
  2020-08-27  8:17   ` Marc Zyngier
  2020-08-26 11:17 ` [patch V2 44/46] platform-msi: Add device MSI infrastructure Thomas Gleixner
                   ` (10 subsequent siblings)
  53 siblings, 1 reply; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

MSI interrupts have some common flags which should be set not only for
PCI/MSI interrupts.

Move the PCI/MSI flag setting into a common function so it can be reused.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch
---
 drivers/pci/msi.c   |    7 +------
 include/linux/msi.h |    1 +
 kernel/irq/msi.c    |   24 ++++++++++++++++++++++++
 3 files changed, 26 insertions(+), 6 deletions(-)

--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1469,12 +1469,7 @@ struct irq_domain *pci_msi_create_irq_do
 	if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS)
 		pci_msi_domain_update_chip_ops(info);
 
-	info->flags |= MSI_FLAG_ACTIVATE_EARLY;
-	if (IS_ENABLED(CONFIG_GENERIC_IRQ_RESERVATION_MODE))
-		info->flags |= MSI_FLAG_MUST_REACTIVATE;
-
-	/* PCI-MSI is oneshot-safe */
-	info->chip->flags |= IRQCHIP_ONESHOT_SAFE;
+	msi_domain_set_default_info_flags(info);
 
 	domain = msi_create_irq_domain(fwnode, info, parent);
 	if (!domain)
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -410,6 +410,7 @@ int platform_msi_domain_alloc(struct irq
 void platform_msi_domain_free(struct irq_domain *domain, unsigned int virq,
 			      unsigned int nvec);
 void *platform_msi_get_host_data(struct irq_domain *domain);
+void msi_domain_set_default_info_flags(struct msi_domain_info *info);
 #endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */
 
 #ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -70,6 +70,30 @@ void get_cached_msi_msg(unsigned int irq
 EXPORT_SYMBOL_GPL(get_cached_msi_msg);
 
 #ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
+void msi_domain_set_default_info_flags(struct msi_domain_info *info)
+{
+	/* Required so that a device latches a valid MSI message on startup */
+	info->flags |= MSI_FLAG_ACTIVATE_EARLY;
+
+	/*
+	 * Interrupt reservation mode allows to stear the MSI message of an
+	 * inactive device to a special (usually spurious interrupt) target.
+	 * This allows to prevent interrupt vector exhaustion e.g. on x86.
+	 * But (PCI)MSI interrupts are activated early - see above - so the
+	 * interrupt request/startup sequence would not try to allocate a
+	 * usable vector which means that the device interupts would end
+	 * up on the special vector and issue spurious interrupt messages.
+	 * Setting the reactivation flag ensures that when the interrupt
+	 * is requested the activation is invoked again so that a real
+	 * vector can be allocated.
+	 */
+	if (IS_ENABLED(CONFIG_GENERIC_IRQ_RESERVATION_MODE))
+		info->flags |= MSI_FLAG_MUST_REACTIVATE;
+
+	/* MSI is oneshot-safe at least in theory */
+	info->chip->flags |= IRQCHIP_ONESHOT_SAFE;
+}
+
 static inline void irq_chip_write_msi_msg(struct irq_data *data,
 					  struct msi_msg *msg)
 {


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 44/46] platform-msi: Add device MSI infrastructure
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (42 preceding siblings ...)
  2020-08-26 11:17 ` [patch V2 43/46] genirq/msi: Provide and use msi_domain_set_default_info_flags() Thomas Gleixner
@ 2020-08-26 11:17 ` Thomas Gleixner
  2020-08-26 11:17 ` [patch V2 45/46] irqdomain/msi: Provide msi_alloc/free_store() callbacks Thomas Gleixner
                   ` (9 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

Add device specific MSI domain infrastructure for devices which have their
own resource management and interrupt chip. These devices are not related
to PCI and contrary to platform MSI they do not share a common resource and
interrupt chip. They provide their own domain specific resource management
and interrupt chip.

This utilizes the new alloc/free override in a non evil way which avoids
having yet another set of specialized alloc/free functions. Just using
msi_domain_alloc/free_irqs() is sufficient

While initially it was suggested and tried to piggyback device MSI on
platform MSI, the better variant is to reimplement platform MSI on top of
device MSI.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 drivers/base/platform-msi.c |  131 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/irqdomain.h   |    1 
 include/linux/msi.h         |   24 ++++++++
 kernel/irq/Kconfig          |    4 +
 4 files changed, 160 insertions(+)

--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -412,3 +412,134 @@ int platform_msi_domain_alloc(struct irq
 
 	return err;
 }
+
+#ifdef CONFIG_DEVICE_MSI
+/*
+ * Device specific MSI domain infrastructure for devices which have their
+ * own resource management and interrupt chip. These devices are not
+ * related to PCI and contrary to platform MSI they do not share a common
+ * resource and interrupt chip. They provide their own domain specific
+ * resource management and interrupt chip.
+ */
+
+static void device_msi_free_msi_entries(struct device *dev)
+{
+	struct list_head *msi_list = dev_to_msi_list(dev);
+	struct msi_desc *entry, *tmp;
+
+	list_for_each_entry_safe(entry, tmp, msi_list, list) {
+		list_del(&entry->list);
+		free_msi_entry(entry);
+	}
+}
+
+/**
+ * device_msi_free_irqs - Free MSI interrupts assigned to  a device
+ * @dev:	Pointer to the device
+ *
+ * Frees the interrupt and the MSI descriptors.
+ */
+static void device_msi_free_irqs(struct irq_domain *domain, struct device *dev)
+{
+	__msi_domain_free_irqs(domain, dev);
+	device_msi_free_msi_entries(dev);
+}
+
+/**
+ * device_msi_alloc_irqs - Allocate MSI interrupts for a device
+ * @dev:	Pointer to the device
+ * @nvec:	Number of vectors
+ *
+ * Allocates the required number of MSI descriptors and the corresponding
+ * interrupt descriptors.
+ */
+static int device_msi_alloc_irqs(struct irq_domain *domain, struct device *dev, int nvec)
+{
+	int i, ret = -ENOMEM;
+
+	for (i = 0; i < nvec; i++) {
+		struct msi_desc *entry = alloc_msi_entry(dev, 1, NULL);
+
+		if (!entry)
+			goto fail;
+		list_add_tail(&entry->list, dev_to_msi_list(dev));
+	}
+
+	ret = __msi_domain_alloc_irqs(domain, dev, nvec);
+	if (!ret)
+		return 0;
+fail:
+	device_msi_free_msi_entries(dev);
+	return ret;
+}
+
+static void device_msi_update_dom_ops(struct msi_domain_info *info)
+{
+	if (!info->ops->domain_alloc_irqs)
+		info->ops->domain_alloc_irqs = device_msi_alloc_irqs;
+	if (!info->ops->domain_free_irqs)
+		info->ops->domain_free_irqs = device_msi_free_irqs;
+	if (!info->ops->msi_prepare)
+		info->ops->msi_prepare = arch_msi_prepare;
+}
+
+/**
+ * device_msi_create_msi_irq_domain - Create an irq domain for devices
+ * @fwnode:	Firmware node of the interrupt controller
+ * @info:	MSI domain info to configure the new domain
+ * @parent:	Parent domain
+ */
+struct irq_domain *device_msi_create_irq_domain(struct fwnode_handle *fn,
+						struct msi_domain_info *info,
+						struct irq_domain *parent)
+{
+	struct irq_domain *domain;
+
+	if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS)
+		platform_msi_update_chip_ops(info);
+
+	if (info->flags & MSI_FLAG_USE_DEF_DOM_OPS)
+		device_msi_update_dom_ops(info);
+
+	msi_domain_set_default_info_flags(info);
+
+	domain = msi_create_irq_domain(fn, info, parent);
+	if (domain)
+		irq_domain_update_bus_token(domain, DOMAIN_BUS_DEVICE_MSI);
+	return domain;
+}
+
+#ifdef CONFIG_PCI
+#include <linux/pci.h>
+
+/**
+ * pci_subdevice_msi_create_irq_domain - Create an irq domain for subdevices
+ * @pdev:	Pointer to PCI device for which the subdevice domain is created
+ * @info:	MSI domain info to configure the new domain
+ */
+struct irq_domain *pci_subdevice_msi_create_irq_domain(struct pci_dev *pdev,
+						       struct msi_domain_info *info)
+{
+	struct irq_domain *domain, *pdev_msi;
+	struct fwnode_handle *fn;
+
+	/*
+	 * Retrieve the MSI domain of the underlying PCI device's MSI
+	 * domain. The PCI device domain's parent domain is also the parent
+	 * domain of the new subdevice domain.
+	 */
+	pdev_msi = dev_get_msi_domain(&pdev->dev);
+	if (!pdev_msi)
+		return NULL;
+
+	fn = irq_domain_alloc_named_fwnode(dev_name(&pdev->dev));
+	if (!fn)
+		return NULL;
+	domain = device_msi_create_irq_domain(fn, info, pdev_msi->parent);
+	if (!domain)
+		irq_domain_free_fwnode(fn);
+	return domain;
+}
+EXPORT_SYMBOL_GPL(pci_subdevice_msi_create_irq_domain);
+#endif /* CONFIG_PCI */
+#endif /* CONFIG_DEVICE_MSI */
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -85,6 +85,7 @@ enum irq_domain_bus_token {
 	DOMAIN_BUS_TI_SCI_INTA_MSI,
 	DOMAIN_BUS_WAKEUP,
 	DOMAIN_BUS_VMD_MSI,
+	DOMAIN_BUS_DEVICE_MSI,
 };
 
 /**
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -56,6 +56,18 @@ struct ti_sci_inta_msi_desc {
 };
 
 /**
+ * device_msi_desc - Device MSI specific MSI descriptor data
+ * @priv:		Pointer to device specific private data
+ * @priv_iomem:		Pointer to device specific private io memory
+ * @hwirq:		The hardware irq number in the device domain
+ */
+struct device_msi_desc {
+	void		*priv;
+	void __iomem	*priv_iomem;
+	u16		hwirq;
+};
+
+/**
  * struct msi_desc - Descriptor structure for MSI based interrupts
  * @list:	List head for management
  * @irq:	The base interrupt number
@@ -127,6 +139,7 @@ struct msi_desc {
 		struct platform_msi_desc platform;
 		struct fsl_mc_msi_desc fsl_mc;
 		struct ti_sci_inta_msi_desc inta;
+		struct device_msi_desc device_msi;
 	};
 };
 
@@ -413,6 +426,17 @@ void *platform_msi_get_host_data(struct
 void msi_domain_set_default_info_flags(struct msi_domain_info *info);
 #endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */
 
+#ifdef CONFIG_DEVICE_MSI
+struct irq_domain *device_msi_create_irq_domain(struct fwnode_handle *fn,
+						struct msi_domain_info *info,
+						struct irq_domain *parent);
+
+# ifdef CONFIG_PCI
+struct irq_domain *pci_subdevice_msi_create_irq_domain(struct pci_dev *pdev,
+						       struct msi_domain_info *info);
+# endif
+#endif /* CONFIG_DEVICE_MSI */
+
 #ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
 void pci_msi_domain_write_msg(struct irq_data *irq_data, struct msi_msg *msg);
 struct irq_domain *pci_msi_create_irq_domain(struct fwnode_handle *fwnode,
--- a/kernel/irq/Kconfig
+++ b/kernel/irq/Kconfig
@@ -93,6 +93,10 @@ config GENERIC_MSI_IRQ_DOMAIN
 	select IRQ_DOMAIN_HIERARCHY
 	select GENERIC_MSI_IRQ
 
+config DEVICE_MSI
+	bool
+	select GENERIC_MSI_IRQ_DOMAIN
+
 config IRQ_MSI_IOMMU
 	bool
 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 45/46] irqdomain/msi: Provide msi_alloc/free_store() callbacks
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (43 preceding siblings ...)
  2020-08-26 11:17 ` [patch V2 44/46] platform-msi: Add device MSI infrastructure Thomas Gleixner
@ 2020-08-26 11:17 ` Thomas Gleixner
  2020-08-26 11:17 ` [patch V2 46/46] irqchip: Add IMS (Interrupt Message Storm) driver - NOT FOR MERGING Thomas Gleixner
                   ` (8 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

From: Thomas Gleixner <tglx@linutronix.de>

For devices which don't have a standard storage for MSI messages like the
upcoming IMS (Interrupt Message Storm) it's required to allocate storage
space before allocating interrupts and after freeing them.

This could be achieved with the existing callbacks, but that would be
awkward because they operate on msi_alloc_info_t which is not uniform
accross architectures. Also these callbacks are invoked per interrupt but
the allocation might have bulk requirements depending on the device.

As such devices can operate on different architectures it is simpler to
have seperate callbacks which operate on struct device. The resulting
storage information has to be stored in struct msi_desc so the underlying
irq chip implementation can retrieve it for the relevant operations.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 include/linux/msi.h |    8 ++++++++
 kernel/irq/msi.c    |   11 +++++++++++
 2 files changed, 19 insertions(+)

--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -279,6 +279,10 @@ struct msi_domain_info;
  *			function.
  * @domain_free_irqs:	Optional function to override the default free
  *			function.
+ * @msi_alloc_store:	Optional callback to allocate storage in a device
+ *			specific non-standard MSI store
+ * @msi_alloc_free:	Optional callback to free storage in a device
+ *			specific non-standard MSI store
  *
  * @get_hwirq, @msi_init and @msi_free are callbacks used by
  * msi_create_irq_domain() and related interfaces
@@ -328,6 +332,10 @@ struct msi_domain_ops {
 					     struct device *dev, int nvec);
 	void		(*domain_free_irqs)(struct irq_domain *domain,
 					    struct device *dev);
+	int		(*msi_alloc_store)(struct irq_domain *domain,
+					   struct device *dev, int nvec);
+	void		(*msi_free_store)(struct irq_domain *domain,
+					    struct device *dev);
 };
 
 /**
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -434,6 +434,12 @@ int __msi_domain_alloc_irqs(struct irq_d
 	if (ret)
 		return ret;
 
+	if (ops->msi_alloc_store) {
+		ret = ops->msi_alloc_store(domain, dev, nvec);
+		if (ret)
+			return ret;
+	}
+
 	for_each_msi_entry(desc, dev) {
 		ops->set_desc(&arg, desc);
 
@@ -533,6 +539,8 @@ int msi_domain_alloc_irqs(struct irq_dom
 
 void __msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
 {
+	struct msi_domain_info *info = domain->host_data;
+	struct msi_domain_ops *ops = info->ops;
 	struct msi_desc *desc;
 
 	for_each_msi_entry(desc, dev) {
@@ -546,6 +554,9 @@ void __msi_domain_free_irqs(struct irq_d
 			desc->irq = 0;
 		}
 	}
+
+	if (ops->msi_free_store)
+		ops->msi_free_store(domain, dev);
 }
 
 /**


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch V2 46/46] irqchip: Add IMS (Interrupt Message Storm) driver - NOT FOR MERGING
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (44 preceding siblings ...)
  2020-08-26 11:17 ` [patch V2 45/46] irqdomain/msi: Provide msi_alloc/free_store() callbacks Thomas Gleixner
@ 2020-08-26 11:17 ` Thomas Gleixner
  2020-08-31 14:45   ` Jason Gunthorpe
  2020-08-28 11:41 ` [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Joerg Roedel
                   ` (7 subsequent siblings)
  53 siblings, 1 reply; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 11:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

Generic IMS irq chips and irq domain implementations for IMS based devices
in both variants:

   - Message store in an array in device memory
   - Message store in system RAM (part of queue memory)

Allocation and freeing of interrupts happens via the generic
msi_domain_alloc/free_irqs() interface. No special purpose IMS magic
required as long as the interrupt domain is stored in the underlying device
struct.
                                                                                                                                                                                                                                                                               
Completely untested of course and mostly for illustration and educational
purpose. This should of course be a modular irq chip, but adding that
support is left as an exercise for the people who care about this deeply.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Reworked to handle both devmem arrays and queue based storage.
---
 drivers/irqchip/Kconfig             |   19 +
 drivers/irqchip/Makefile            |    1 
 drivers/irqchip/irq-ims-msi.c       |  343 ++++++++++++++++++++++++++++++++++++
 include/linux/irqchip/irq-ims-msi.h |   95 +++++++++
 4 files changed, 458 insertions(+)

--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -571,4 +571,23 @@ config LOONGSON_PCH_MSI
 	help
 	  Support for the Loongson PCH MSI Controller.
 
+config IMS_MSI
+	depends on PCI
+	select DEVICE_MSI
+	bool
+
+config IMS_MSI_ARRAY
+	bool "IMS Interrupt Message Storm MSI controller for device memory storage arrays"
+	select IMS_MSI
+	help
+	  Support for IMS Interrupt Message Storm MSI controller
+	  with IMS slot storage in a slot array in device memory
+
+config IMS_MSI_QUEUE
+	bool "IMS Interrupt Message Storm MSI controller for IMS queue storage"
+	select IMS_MSI
+	help
+	  Support for IMS Interrupt Message Storm MSI controller
+	  with IMS slot storage in the queue storage of a device
+
 endmenu
--- a/drivers/irqchip/Makefile
+++ b/drivers/irqchip/Makefile
@@ -111,3 +111,4 @@ obj-$(CONFIG_LOONGSON_HTPIC)		+= irq-loo
 obj-$(CONFIG_LOONGSON_HTVEC)		+= irq-loongson-htvec.o
 obj-$(CONFIG_LOONGSON_PCH_PIC)		+= irq-loongson-pch-pic.o
 obj-$(CONFIG_LOONGSON_PCH_MSI)		+= irq-loongson-pch-msi.o
+obj-$(CONFIG_IMS_MSI)			+= irq-ims-msi.o
--- /dev/null
+++ b/drivers/irqchip/irq-ims-msi.c
@@ -0,0 +1,343 @@
+// SPDX-License-Identifier: GPL-2.0
+// (C) Copyright 2020 Thomas Gleixner <tglx@linutronix.de>
+/*
+ * Shared interrupt chips and irq domains for IMS devices
+ */
+#include <linux/device.h>
+#include <linux/slab.h>
+#include <linux/msi.h>
+#include <linux/irq.h>
+
+#include <linux/irqchip/irq-ims-msi.h>
+
+#ifdef CONFIG_IMS_ARRAY
+
+struct ims_array_data {
+	struct ims_array_info	info;
+	unsigned long		map[0];
+};
+
+static void ims_array_mask_irq(struct irq_data *data)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+	struct ims_slot __iomem *slot = desc->device_msi.priv_iomem;
+	u32 __iomem *ctrl = &slot->ctrl;
+
+	iowrite32(ioread32(ctrl) & ~IMS_VECTOR_CTRL_UNMASK, ctrl);
+}
+
+static void ims_array_unmask_irq(struct irq_data *data)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+	struct ims_slot __iomem *slot = desc->device_msi.priv_iomem;
+	u32 __iomem *ctrl = &slot->ctrl;
+
+	iowrite32(ioread32(ctrl) | IMS_VECTOR_CTRL_UNMASK, ctrl);
+}
+
+static void ims_array_write_msi_msg(struct irq_data *data, struct msi_msg *msg)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+	struct ims_slot __iomem *slot = desc->device_msi.priv_iomem;
+
+	iowrite32(msg->address_lo, &slot->address_lo);
+	iowrite32(msg->address_hi, &slot->address_hi);
+	iowrite32(msg->data, &slot->data);
+}
+
+static const struct irq_chip ims_array_msi_controller = {
+	.name			= "IMS",
+	.irq_mask		= ims_array_mask_irq,
+	.irq_unmask		= ims_array_unmask_irq,
+	.irq_write_msi_msg	= ims_array_write_msi_msg,
+	.irq_retrigger		= irq_chip_retrigger_hierarchy,
+	.flags			= IRQCHIP_SKIP_SET_WAKE,
+};
+
+static void ims_array_reset_slot(struct ims_slot __iomem *slot)
+{
+	iowrite32(0, &slot->address_lo);
+	iowrite32(0, &slot->address_hi);
+	iowrite32(0, &slot->data);
+	iowrite32(0, &slot->ctrl);
+}
+
+static void ims_array_free_msi_store(struct irq_domain *domain,
+				     struct device *dev)
+{
+	struct msi_domain_info *info = domain->host_data;
+	struct ims_array_data *ims = info->data;
+	struct msi_desc *entry;
+
+	for_each_msi_entry(entry, dev) {
+		if (entry->device_msi.priv_iomem) {
+			clear_bit(entry->device_msi.hwirq, ims->map);
+			ims_array_reset_slot(entry->device_msi.priv_iomem);
+			entry->device_msi.priv_iomem = NULL;
+			entry->device_msi.hwirq = 0;
+		}
+	}
+}
+
+static int ims_array_alloc_msi_store(struct irq_domain *domain,
+				     struct device *dev, int nvec)
+{
+	struct msi_domain_info *info = domain->host_data;
+	struct ims_array_data *ims = info->data;
+	struct msi_desc *entry;
+
+	for_each_msi_entry(entry, dev) {
+		unsigned int idx;
+
+		idx = find_first_zero_bit(ims->map, ims->info.max_slots);
+		if (idx >= ims->info.max_slots)
+			goto fail;
+		set_bit(idx, ims->map);
+		entry->device_msi.priv_iomem = &ims->info.slots[idx];
+		entry->device_msi.hwirq = idx;
+	}
+	return 0;
+
+fail:
+	ims_array_free_msi_store(domain, dev);
+	return -ENOSPC;
+}
+
+struct ims_domain_template {
+	struct msi_domain_ops	ops;
+	struct msi_domain_info	info;
+};
+
+static const struct ims_array_domain_template ims_array_domain_template = {
+	.ops = {
+		.msi_alloc_store	= ims_array_alloc_msi_store,
+		.msi_free_store		= ims_array_free_msi_store,
+	},
+	.info = {
+		.flags		= MSI_FLAG_USE_DEF_DOM_OPS |
+				  MSI_FLAG_USE_DEF_CHIP_OPS,
+		.handler	= handle_edge_irq,
+		.handler_name	= "edge",
+	},
+};
+
+struct irq_domain *
+pci_ims_array_create_msi_irq_domain(struct pci_dev *pdev,
+				    struct ims_array_info *ims_info)
+{
+	struct ims_domain_template *info;
+	struct ims_array_data *data;
+	struct irq_domain *domain;
+	struct irq_chip *chip;
+	unsigned int size;
+
+	/* Allocate new domain storage */
+	info = kmemdup(&ims_array_domain_template,
+		       sizeof(ims_array_domain_template), GFP_KERNEL);
+	if (!info)
+		return NULL;
+	/* Link the ops */
+	info->info.ops = &info->ops;
+
+	/* Allocate ims_info along with the bitmap */
+	size = sizeof(*data);
+	size += BITS_TO_LONGS(ims_info->max_slots) * sizeof(unsigned long);
+	data = kzalloc(size, GFP_KERNEL);
+	if (!data)
+		goto err_info;
+
+	data->info = *ims_info;
+	info->info.data = data;
+
+	/*
+	 * Allocate an interrupt chip because the core needs to be able to
+	 * update it with default callbacks.
+	 */
+	chip = kmemdup(&ims_array_msi_controller,
+		       sizeof(ims_array_msi_controller), GFP_KERNEL);
+	if (!chip)
+		goto err_data;
+	info->info.chip = chip;
+
+	domain = pci_subdevice_msi_create_irq_domain(pdev, &info->info);
+	if (!domain)
+		goto err_chip;
+
+	return domain;
+
+err_chip:
+	kfree(chip);
+err_data:
+	kfree(data);
+err_info:
+	kfree(info);
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(pci_ims_array_create_msi_irq_domain);
+
+#endif /* CONFIG_IMS_ARRAY */
+
+#ifdef CONFIG_IMS_QUEUE
+
+static void ims_queue_mask_irq(struct irq_data *data)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+	struct ims_slot *slot = desc->device_msi.priv;
+
+	slot->ctrl &= ~IMS_VECTOR_CTRL_UNMASK;
+}
+
+static void ims_queue_unmask_irq(struct irq_data *data)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+	struct ims_slot *slot = desc->device_msi.priv;
+
+	slot->ctrl |= IMS_VECTOR_CTRL_UNMASK;
+}
+
+static void ims_queue_write_msi_msg(struct irq_data *data, struct msi_msg *msg)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+	struct ims_slot *slot = desc->device_msi.priv;
+
+	slot->address_lo = msg->address_lo;
+	slot->address_hi = msg->address_hi;
+	slot->data = msg->data;
+}
+
+static void ims_queue_lock(struct irq_data *data)
+{
+	struct ims_queue_info *ims = irq_data_get_chip_data(data);
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+
+	ims->queue_lock(desc->device_msi.priv);
+}
+
+static void ims_queue_sync_unlock(struct irq_data *data)
+{
+	struct ims_queue_info *ims = irq_data_get_chip_data(data);
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+
+	ims->queue_sync_unlock(desc->device_msi.priv);
+}
+
+static const struct irq_chip ims_queue_msi_controller = {
+	.name			= "IMS",
+	.irq_mask		= ims_queue_mask_irq,
+	.irq_unmask		= ims_queue_unmask_irq,
+	.irq_disable		= ims_queue_mask_irq,
+	.irq_enable		= ims_queue_unmask_irq,
+	.irq_write_msi_msg	= ims_queue_write_msi_msg,
+	.irq_retrigger		= irq_chip_retrigger_hierarchy,
+	.irq_bus_lock		= ims_queue_lock,
+	.irq_bus_sync_unlock	= ims_queue_sync_unlock,
+	.flags			= IRQCHIP_SKIP_SET_WAKE,
+};
+
+static void ims_queue_reset_slot(struct ims_slot *slot)
+{
+	memset(slot, 0, sizeof(*slot));
+}
+
+static void ims_queue_free_msi_store(struct irq_domain *domain, struct device *dev)
+{
+	struct msi_domain_info *info = domain->host_data;
+	struct ims_queue_info *ims = info->chip_data;
+	struct msi_desc *entry;
+
+	for_each_msi_entry(entry, dev) {
+		if (entry->device_msi.priv) {
+			ims_queue_reset_slot(entry->device_msi.priv);
+			entry->device_msi.priv = NULL;
+			entry->device_msi.hwirq = 0;
+		}
+	}
+}
+
+static int ims_queue_alloc_msi_store(struct irq_domain *domain,
+				     struct device *dev,
+				     int nvec)
+{
+	struct msi_domain_info *info = domain->host_data;
+	struct ims_queue_info *ims = info->chip_data;
+	struct msi_desc *entry;
+	struct ims_slot *slot;
+	int idx = 0;
+
+	for_each_msi_entry(entry, dev) {
+		slot = ims->queue_get_shadow(dev, idx);
+		if (!slot)
+			goto fail;
+		entry->device_msi.priv = slot;
+		entry->device_msi.hwirq = (dev->id << 16) | idx;
+		idx++;
+	}
+	return 0;
+
+fail:
+	ims_queue_free_msi_store(domain, dev);
+	return -ENOSPC;
+}
+
+struct ims_domain_template {
+	struct msi_domain_ops	ops;
+	struct msi_domain_info	info;
+};
+
+static const struct ims_queue_domain_template ims_queue_domain_template = {
+	.ops = {
+		.msi_alloc_store	= ims_queue_alloc_msi_store,
+		.msi_free_store		= ims_queue_free_msi_store,
+	},
+	.info = {
+		.flags		= MSI_FLAG_USE_DEF_DOM_OPS |
+				  MSI_FLAG_USE_DEF_CHIP_OPS,
+		.handler	= handle_edge_irq,
+		.handler_name	= "edge",
+	},
+};
+
+struct irq_domain *
+pci_ims_queue_create_msi_irq_domain(struct pci_dev *pdev,
+				    struct ims_queue_info *ims_info)
+{
+	struct ims_domain_template *info;
+	struct irq_domain *domain;
+	struct irq_chip *chip;
+	unsigned int size;
+
+	/* Allocate new domain storage */
+	info = kmemdup(&ims_queue_domain_template,
+			  sizeof(ims_queue_domain_template), GFP_KERNEL);
+	if (!info)
+		return NULL;
+	/* Link the ops */
+	info->info.ops = &info->ops;
+
+	info->info.chip_data = ims_info;
+
+	/*
+	 * Allocate an interrupt chip because the core needs to be able to
+	 * update it with default callbacks.
+	 */
+	chip = kmemdup(&ims_queue_msi_controller,
+		       sizeof(ims_queue_msi_controller), GFP_KERNEL);
+	if (!chip)
+		goto err_info;
+	info->info.chip = chip;
+
+	domain = pci_subdevice_msi_create_irq_domain(pdev, &info->info);
+	if (!domain)
+		goto err_chip;
+
+	return domain;
+
+err_chip:
+	kfree(chip);
+err_info:
+	kfree(info);
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(pci_ims_queue_create_msi_irq_domain);
+
+#endif /* CONFIG_IMS_ARRAY */
--- /dev/null
+++ b/include/linux/irqchip/irq-ims-msi.h
@@ -0,0 +1,95 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* (C) Copyright 2020 Thomas Gleixner <tglx@linutronix.de> */
+
+#ifndef _LINUX_IRQCHIP_IRQ_IMS_MSI_H
+#define _LINUX_IRQCHIP_IRQ_IMS_MSI_H
+
+#include <linux/types.h>
+
+/**
+ * ims_hw_slot - The hardware layout of an IMS based MSI message
+ * @address_lo:	Lower 32bit address
+ * @address_hi:	Upper 32bit address
+ * @data:	Message data
+ * @ctrl:	Control word
+ *
+ * This structure is used by both the device memory array and the queue
+ * memory variants of IMS.
+ */
+struct ims_slot {
+	u32	address_lo;
+	u32	address_hi;
+	u32	data;
+	u32	ctrl;
+} __packed;
+
+/* Bit to unmask the interrupt in ims_hw_slot::ctrl */
+#define IMS_VECTOR_CTRL_UNMASK	0x01
+
+/**
+ * struct ims_array_info - Information to create an IMS array domain
+ * @slots:	Pointer to the start of the array
+ * @max_slots:	Maximum number of slots in the array
+ */
+struct ims_array_info {
+	struct ims_slot		__iomem *slots;
+	unsigned int		max_slots;
+};
+
+/**
+ * ims_queue_info - Information to create an IMS queue domain
+ * @queue_lock:		Callback which informs the device driver that
+ *			an interrupt management operation starts.
+ * @queue_sync_unlock:	Callback which informs the device driver that an
+ *			interrupt management operation ends.
+
+ * @queue_get_shadow:   Callback to retrieve te shadow storage for a MSI
+ *			entry associated to a queue. The queue is
+ *			identified by the device struct which is used for
+ *			allocating interrupts and the msi entry index.
+ *
+ * @queue_lock() and @queue_sync_unlock() are only called for management
+ * operations on a particular interrupt: request, free, enable, disable,
+ * affinity setting.  These functions are never called from atomic context,
+ * like low level interrupt handling code. The purpose of these functions
+ * is to signal the device driver the start and end of an operation which
+ * affects the IMS queue shadow state. @queue_lock() allows the driver to
+ * do preperatory work, e.g. locking. Note, that @queue_lock() has to
+ * preserve the sleepable state on return. That means the driver cannot
+ * disable preemption and (soft)interrupts in @queue_lock and then undo
+ * that operation in @queue_sync_unlock() which restricts the lock types
+ * for eventual serialization of these operations to sleepable locks. Of
+ * course the driver can disable preemption and (soft)interrupts
+ * temporarily for internal work.
+ *
+ * On @queue_sync_unlock() the driver has to check whether the shadow state
+ * changed and issue a command to update the hardware state and wait for
+ * the command to complete. If the command fails or times out then the
+ * driver has to take care of the resulting mess as this is called from
+ * functions which have no return value and none of the callers can deal
+ * with the failure. The lock which is used by the driver to protect a
+ * operation sequence must obviously not be released before the command
+ * completes or fails. Otherwise new operations on the same interrupt line
+ * could take place and change the shadow state before the driver was able
+ * to compose the command.
+ *
+ * The serialization is not provided by the common interrupt chip and also
+ * not by the core code to let the driver decide about the granularity of
+ * the locking and the lock type.
+ */
+struct ims_queue_info {
+	void		(*queue_lock)(struct ims_slot *shadow);
+	void		(*queue_sync_unlock)(struct ims_slot *shadow);
+	struct ims_slot	*(*queue_get_shadow)(struct device *dev, unsigned int msi_index);
+};
+
+struct pci_dev;
+struct irq_domain;
+
+struct irq_domain *pci_ims_array_create_msi_irq_domain(struct pci_dev *pdev,
+						       struct ims_array_info *ims_info);
+
+struct irq_domain *pci_ims_queue_create_msi_irq_domain(struct pci_dev *pdev,
+						       struct ims_queue_info *ims_info);
+
+#endif


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
  2020-08-26 11:17 ` [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable Thomas Gleixner
@ 2020-08-26 15:53   ` Thomas Gleixner
  2020-08-26 21:14   ` Marc Zyngier
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 15:53 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On Wed, Aug 26 2020 at 13:17, Thomas Gleixner wrote:
> + * If CONFIG_PCI_MSI_ARCH_FALLBACKS is not selected they are replaced by
> + * stubs with warnings.
>   */
> +#ifdef CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS

Groan, I obviously failed to pull that back from the test box where I
fixed it. That wants to be:

+#ifdef CONFIG_PCI_MSI_ARCH_FALLBACKS

Doing five things at once does not work well

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 15/46] x86/irq: Consolidate DMAR irq allocation
  2020-08-26 11:16 ` [patch V2 15/46] x86/irq: Consolidate DMAR irq allocation Thomas Gleixner
@ 2020-08-26 16:50   ` Dey, Megha
  2020-08-26 18:32     ` Thomas Gleixner
  0 siblings, 1 reply; 112+ messages in thread
From: Dey, Megha @ 2020-08-26 16:50 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jason Gunthorpe, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

Hi Thomas,

On 8/26/2020 4:16 AM, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
>
> None of the DMAR specific fields are required.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>
> ---
>   arch/x86/include/asm/hw_irq.h |    6 ------
>   arch/x86/kernel/apic/msi.c    |   10 +++++-----
>   2 files changed, 5 insertions(+), 11 deletions(-)
>
> --- a/arch/x86/include/asm/hw_irq.h
> +++ b/arch/x86/include/asm/hw_irq.h
> @@ -83,12 +83,6 @@ struct irq_alloc_info {
>   			irq_hw_number_t	msi_hwirq;
>   		};
>   #endif
> -#ifdef	CONFIG_DMAR_TABLE
> -		struct {
> -			int		dmar_id;
> -			void		*dmar_data;
> -		};
> -#endif
>   #ifdef	CONFIG_X86_UV
>   		struct {
>   			int		uv_limit;
> --- a/arch/x86/kernel/apic/msi.c
> +++ b/arch/x86/kernel/apic/msi.c
> @@ -329,15 +329,15 @@ static struct irq_chip dmar_msi_controll
>   static irq_hw_number_t dmar_msi_get_hwirq(struct msi_domain_info *info,
>   					  msi_alloc_info_t *arg)
>   {
> -	return arg->dmar_id;
> +	return arg->hwirq;

Shouldn't this return the arg->devid which gets set in dmar_alloc_hwirq?

-Megha

>   }
>   
>   static int dmar_msi_init(struct irq_domain *domain,
>   			 struct msi_domain_info *info, unsigned int virq,
>   			 irq_hw_number_t hwirq, msi_alloc_info_t *arg)
>   {
> -	irq_domain_set_info(domain, virq, arg->dmar_id, info->chip, NULL,
> -			    handle_edge_irq, arg->dmar_data, "edge");
> +	irq_domain_set_info(domain, virq, arg->devid, info->chip, NULL,
> +			    handle_edge_irq, arg->data, "edge");
>   
>   	return 0;
>   }
> @@ -384,8 +384,8 @@ int dmar_alloc_hwirq(int id, int node, v
>   
>   	init_irq_alloc_info(&info, NULL);
>   	info.type = X86_IRQ_ALLOC_TYPE_DMAR;
> -	info.dmar_id = id;
> -	info.dmar_data = arg;
> +	info.devid = id;
> +	info.data = arg;
>   
>   	return irq_domain_alloc_irqs(domain, 1, node, &info);
>   }
>
>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 15/46] x86/irq: Consolidate DMAR irq allocation
  2020-08-26 16:50   ` Dey, Megha
@ 2020-08-26 18:32     ` Thomas Gleixner
  2020-08-26 20:50       ` Thomas Gleixner
  0 siblings, 1 reply; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 18:32 UTC (permalink / raw)
  To: Dey, Megha, LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jason Gunthorpe, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On Wed, Aug 26 2020 at 09:50, Megha Dey wrote:
>> @@ -329,15 +329,15 @@ static struct irq_chip dmar_msi_controll
>>   static irq_hw_number_t dmar_msi_get_hwirq(struct msi_domain_info *info,
>>   					  msi_alloc_info_t *arg)
>>   {
>> -	return arg->dmar_id;
>> +	return arg->hwirq;
>
> Shouldn't this return the arg->devid which gets set in dmar_alloc_hwirq?

Indeed.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 29/46] irqdomain/msi: Allow to override msi_domain_alloc/free_irqs()
  2020-08-26 11:16 ` [patch V2 29/46] irqdomain/msi: Allow to override msi_domain_alloc/free_irqs() Thomas Gleixner
@ 2020-08-26 19:06   ` Marc Zyngier
  2020-08-26 19:47     ` Thomas Gleixner
  2020-08-28  0:24   ` Dey, Megha
  1 sibling, 1 reply; 112+ messages in thread
From: Marc Zyngier @ 2020-08-26 19:06 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Greg Kroah-Hartman, Rafael J. Wysocki,
	Megha Dey, Jason Gunthorpe, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On Wed, 26 Aug 2020 12:16:57 +0100,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> To support MSI irq domains which do not fit at all into the regular MSI
> irqdomain scheme, like the XEN MSI interrupt management for PV/HVM/DOM0,
> it's necessary to allow to override the alloc/free implementation.
> 
> This is a preperatory step to switch X86 away from arch_*_msi_irqs() and
> store the irq domain pointer right in struct device.
> 
> No functional change for existing MSI irq domain users.
> 
> Aside of the evil XEN wrapper this is also useful for special MSI domains
> which need to do extra alloc/free work before/after calling the generic
> core function. Work like allocating/freeing MSI descriptors, MSI storage
> space etc.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> 
> ---
>  include/linux/msi.h |   27 ++++++++++++++++++++
>  kernel/irq/msi.c    |   70 +++++++++++++++++++++++++++++++++++-----------------
>  2 files changed, 75 insertions(+), 22 deletions(-)
> 
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -241,6 +241,10 @@ struct msi_domain_info;
>   * @msi_finish:		Optional callback to finalize the allocation
>   * @set_desc:		Set the msi descriptor for an interrupt
>   * @handle_error:	Optional error handler if the allocation fails
> + * @domain_alloc_irqs:	Optional function to override the default allocation
> + *			function.
> + * @domain_free_irqs:	Optional function to override the default free
> + *			function.
>   *
>   * @get_hwirq, @msi_init and @msi_free are callbacks used by
>   * msi_create_irq_domain() and related interfaces
> @@ -248,6 +252,22 @@ struct msi_domain_info;
>   * @msi_check, @msi_prepare, @msi_finish, @set_desc and @handle_error
>   * are callbacks used by msi_domain_alloc_irqs() and related
>   * interfaces which are based on msi_desc.
> + *
> + * @domain_alloc_irqs, @domain_free_irqs can be used to override the
> + * default allocation/free functions (__msi_domain_alloc/free_irqs). This
> + * is initially for a wrapper around XENs seperate MSI universe which can't
> + * be wrapped into the regular irq domains concepts by mere mortals.  This
> + * allows to universally use msi_domain_alloc/free_irqs without having to
> + * special case XEN all over the place.
> + *
> + * Contrary to other operations @domain_alloc_irqs and @domain_free_irqs
> + * are set to the default implementation if NULL and even when
> + * MSI_FLAG_USE_DEF_DOM_OPS is not set to avoid breaking existing users and
> + * because these callbacks are obviously mandatory.
> + *
> + * This is NOT meant to be abused, but it can be useful to build wrappers
> + * for specialized MSI irq domains which need extra work before and after
> + * calling __msi_domain_alloc_irqs()/__msi_domain_free_irqs().
>   */
>  struct msi_domain_ops {
>  	irq_hw_number_t	(*get_hwirq)(struct msi_domain_info *info,
> @@ -270,6 +290,10 @@ struct msi_domain_ops {
>  				    struct msi_desc *desc);
>  	int		(*handle_error)(struct irq_domain *domain,
>  					struct msi_desc *desc, int error);
> +	int		(*domain_alloc_irqs)(struct irq_domain *domain,
> +					     struct device *dev, int nvec);
> +	void		(*domain_free_irqs)(struct irq_domain *domain,
> +					    struct device *dev);
>  };
>  
>  /**
> @@ -327,8 +351,11 @@ int msi_domain_set_affinity(struct irq_d
>  struct irq_domain *msi_create_irq_domain(struct fwnode_handle *fwnode,
>  					 struct msi_domain_info *info,
>  					 struct irq_domain *parent);
> +int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
> +			    int nvec);
>  int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
>  			  int nvec);
> +void __msi_domain_free_irqs(struct irq_domain *domain, struct device *dev);
>  void msi_domain_free_irqs(struct irq_domain *domain, struct device *dev);
>  struct msi_domain_info *msi_get_domain_info(struct irq_domain *domain);
>  
> --- a/kernel/irq/msi.c
> +++ b/kernel/irq/msi.c
> @@ -229,11 +229,13 @@ static int msi_domain_ops_check(struct i
>  }
>  
>  static struct msi_domain_ops msi_domain_ops_default = {
> -	.get_hwirq	= msi_domain_ops_get_hwirq,
> -	.msi_init	= msi_domain_ops_init,
> -	.msi_check	= msi_domain_ops_check,
> -	.msi_prepare	= msi_domain_ops_prepare,
> -	.set_desc	= msi_domain_ops_set_desc,
> +	.get_hwirq		= msi_domain_ops_get_hwirq,
> +	.msi_init		= msi_domain_ops_init,
> +	.msi_check		= msi_domain_ops_check,
> +	.msi_prepare		= msi_domain_ops_prepare,
> +	.set_desc		= msi_domain_ops_set_desc,
> +	.domain_alloc_irqs	= __msi_domain_alloc_irqs,
> +	.domain_free_irqs	= __msi_domain_free_irqs,
>  };
>  
>  static void msi_domain_update_dom_ops(struct msi_domain_info *info)
> @@ -245,6 +247,14 @@ static void msi_domain_update_dom_ops(st
>  		return;
>  	}
>  
> +	if (ops->domain_alloc_irqs == NULL)
> +		ops->domain_alloc_irqs = msi_domain_ops_default.domain_alloc_irqs;
> +	if (ops->domain_free_irqs == NULL)
> +		ops->domain_free_irqs = msi_domain_ops_default.domain_free_irqs;
> +
> +	if (!(info->flags & MSI_FLAG_USE_DEF_DOM_OPS))
> +		return;
> +
>  	if (ops->get_hwirq == NULL)
>  		ops->get_hwirq = msi_domain_ops_default.get_hwirq;
>  	if (ops->msi_init == NULL)
> @@ -278,8 +288,7 @@ struct irq_domain *msi_create_irq_domain
>  {
>  	struct irq_domain *domain;
>  
> -	if (info->flags & MSI_FLAG_USE_DEF_DOM_OPS)
> -		msi_domain_update_dom_ops(info);
> +	msi_domain_update_dom_ops(info);
>  	if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS)
>  		msi_domain_update_chip_ops(info);
>  
> @@ -386,17 +395,8 @@ static bool msi_check_reservation_mode(s
>  	return desc->msi_attrib.is_msix || desc->msi_attrib.maskbit;
>  }
>  
> -/**
> - * msi_domain_alloc_irqs - Allocate interrupts from a MSI interrupt domain
> - * @domain:	The domain to allocate from
> - * @dev:	Pointer to device struct of the device for which the interrupts
> - *		are allocated
> - * @nvec:	The number of interrupts to allocate
> - *
> - * Returns 0 on success or an error code.
> - */
> -int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
> -			  int nvec)
> +int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
> +			    int nvec)
>  {
>  	struct msi_domain_info *info = domain->host_data;
>  	struct msi_domain_ops *ops = info->ops;
> @@ -490,12 +490,24 @@ int msi_domain_alloc_irqs(struct irq_dom
>  }
>  
>  /**
> - * msi_domain_free_irqs - Free interrupts from a MSI interrupt @domain associated tp @dev
> - * @domain:	The domain to managing the interrupts
> + * msi_domain_alloc_irqs - Allocate interrupts from a MSI interrupt domain
> + * @domain:	The domain to allocate from
>   * @dev:	Pointer to device struct of the device for which the interrupts
> - *		are free
> + *		are allocated
> + * @nvec:	The number of interrupts to allocate
> + *
> + * Returns 0 on success or an error code.
>   */
> -void msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
> +int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
> +			  int nvec)
> +{
> +	struct msi_domain_info *info = domain->host_data;
> +	struct msi_domain_ops *ops = info->ops;

Rework leftovers, I imagine.

> +
> +	return ops->domain_alloc_irqs(domain, dev, nvec);
> +}
> +
> +void __msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
>  {
>  	struct msi_desc *desc;
>  
> @@ -513,6 +525,20 @@ void msi_domain_free_irqs(struct irq_dom
>  }
>  
>  /**
> + * __msi_domain_free_irqs - Free interrupts from a MSI interrupt @domain associated tp @dev

Spurious __.

> + * @domain:	The domain to managing the interrupts
> + * @dev:	Pointer to device struct of the device for which the interrupts
> + *		are free
> + */
> +void msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
> +{
> +	struct msi_domain_info *info = domain->host_data;
> +	struct msi_domain_ops *ops = info->ops;

Same thing?

> +
> +	return ops->domain_free_irqs(domain, dev);
> +}
> +
> +/**
>   * msi_get_domain_info - Get the MSI interrupt domain info for @domain
>   * @domain:	The interrupt domain to retrieve data from
>   *

Otherwise looks good to me:

Reviewed-by: Marc Zyngier <maz@kernel.org>

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 29/46] irqdomain/msi: Allow to override msi_domain_alloc/free_irqs()
  2020-08-26 19:06   ` Marc Zyngier
@ 2020-08-26 19:47     ` Thomas Gleixner
  2020-08-26 21:33       ` Marc Zyngier
  0 siblings, 1 reply; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 19:47 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Greg Kroah-Hartman, Rafael J. Wysocki,
	Megha Dey, Jason Gunthorpe, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On Wed, Aug 26 2020 at 20:06, Marc Zyngier wrote:
> On Wed, 26 Aug 2020 12:16:57 +0100,
> Thomas Gleixner <tglx@linutronix.de> wrote:
>>  /**
>> - * msi_domain_free_irqs - Free interrupts from a MSI interrupt @domain associated tp @dev
>> - * @domain:	The domain to managing the interrupts
>> + * msi_domain_alloc_irqs - Allocate interrupts from a MSI interrupt domain
>> + * @domain:	The domain to allocate from
>>   * @dev:	Pointer to device struct of the device for which the interrupts
>> - *		are free
>> + *		are allocated
>> + * @nvec:	The number of interrupts to allocate
>> + *
>> + * Returns 0 on success or an error code.
>>   */
>> -void msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
>> +int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
>> +			  int nvec)
>> +{
>> +	struct msi_domain_info *info = domain->host_data;
>> +	struct msi_domain_ops *ops = info->ops;
>
> Rework leftovers, I imagine.

Hmm, no. How would it call ops->domain_alloc_irqs() without getting the
ops. I know, that the diff is horrible, but don't blame me for it. diff
sucks at times.

>> +
>> +	return ops->domain_alloc_irqs(domain, dev, nvec);
>> +}
>> +
>> +void __msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
>>  {
>>  	struct msi_desc *desc;
>>  
>> @@ -513,6 +525,20 @@ void msi_domain_free_irqs(struct irq_dom
>>  }
>>  
>>  /**
>> + * __msi_domain_free_irqs - Free interrupts from a MSI interrupt @domain associated tp @dev
>
> Spurious __.

Yup.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 04/46] genirq/chip: Use the first chip in irq_chip_compose_msi_msg()
  2020-08-26 11:16 ` [patch V2 04/46] genirq/chip: Use the first chip in irq_chip_compose_msi_msg() Thomas Gleixner
@ 2020-08-26 19:50   ` Marc Zyngier
  2020-08-26 21:19     ` Thomas Gleixner
  0 siblings, 1 reply; 112+ messages in thread
From: Marc Zyngier @ 2020-08-26 19:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Greg Kroah-Hartman, Rafael J. Wysocki,
	Megha Dey, Jason Gunthorpe, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On Wed, 26 Aug 2020 12:16:32 +0100,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> The documentation of irq_chip_compose_msi_msg() claims that with
> hierarchical irq domains the first chip in the hierarchy which has an
> irq_compose_msi_msg() callback is chosen. But the code just keeps
> iterating after it finds a chip with a compose callback.
> 
> The x86 HPET MSI implementation relies on that behaviour, but that does not
> make it more correct.
> 
> The message should always be composed at the domain which manages the
> underlying resource (e.g. APIC or remap table) because that domain knows
> about the required layout of the message.
> 
> On X86 the following hierarchies exist:
> 
> 1)   vector -------- PCI/MSI
> 2)   vector -- IR -- PCI/MSI
> 
> The vector domain has a different message format than the IR (remapping)
> domain. So obviously the PCI/MSI domain can't compose the message without
> having knowledge about the parent domain, which is exactly the opposite of
> what hierarchical domains want to achieve.
> 
> X86 actually has two different PCI/MSI chips where #1 has a compose
> callback and #2 does not. #2 delegates the composition to the remap domain
> where it belongs, but #1 does it at the PCI/MSI level.
> 
> For the upcoming device MSI support it's necessary to change this and just
> let the first domain which can compose the message take care of it. That
> way the top level chip does not have to worry about it and the device MSI
> code does not need special knowledge about topologies. It just sets the
> compose callback to NULL and lets the hierarchy pick the first chip which
> has one.
> 
> Due to that the attempt to move the compose callback from the direct
> delivery PCI/MSI domain to the vector domain made the system fail to boot
> with interrupt remapping enabled because in the remapping case
> irq_chip_compose_msi_msg() keeps iterating and choses the compose callback
> of the vector domain which obviously creates the wrong format for the remap
> table.
> 
> Break out of the loop when the first irq chip with a compose callback is
> found and fixup the HPET code temporarily. That workaround will be removed
> once the direct delivery compose callback is moved to the place where it
> belongs in the vector domain.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> V2: New patch. Note, that this might break other stuff which relies on the
>     current behaviour, but the hierarchy composition of DT based chips is
>     really hard to follow.

Grepping around, I don't think there is any occurrence of two irqchips
providing irq_compose_msi() that can share a hierarchy on any real
system, so we should be fine. Famous last words.

> ---
>  arch/x86/kernel/apic/msi.c |    7 +++++--
>  kernel/irq/chip.c          |   12 +++++++++---
>  2 files changed, 14 insertions(+), 5 deletions(-)
> 
> --- a/arch/x86/kernel/apic/msi.c
> +++ b/arch/x86/kernel/apic/msi.c
> @@ -479,10 +479,13 @@ struct irq_domain *hpet_create_irq_domai
>  	info.type = X86_IRQ_ALLOC_TYPE_HPET;
>  	info.hpet_id = hpet_id;
>  	parent = irq_remapping_get_ir_irq_domain(&info);
> -	if (parent == NULL)
> +	if (parent == NULL) {
>  		parent = x86_vector_domain;
> -	else
> +	} else {
>  		hpet_msi_controller.name = "IR-HPET-MSI";
> +		/* Temporary fix: Will go away */
> +		hpet_msi_controller.irq_compose_msi_msg = NULL;
> +	}
>  
>  	fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
>  					      hpet_id);
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -1544,10 +1544,16 @@ int irq_chip_compose_msi_msg(struct irq_
>  	struct irq_data *pos = NULL;
>  
>  #ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
> -	for (; data; data = data->parent_data)
> -#endif
> -		if (data->chip && data->chip->irq_compose_msi_msg)
> +	for (; data; data = data->parent_data) {
> +		if (data->chip && data->chip->irq_compose_msi_msg) {
>  			pos = data;
> +			break;
> +		}
> +	}
> +#else
> +	if (data->chip && data->chip->irq_compose_msi_msg)
> +		pos = data;
> +#endif
>  	if (!pos)
>  		return -ENOSYS;
>  
> 
> 

Is it just me, or is this last change more complex than it ought to be?

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 857f5f4c8098..25e18b73699c 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -1544,7 +1544,7 @@ int irq_chip_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
 	struct irq_data *pos = NULL;
 
 #ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
-	for (; data; data = data->parent_data)
+	for (; data && !pos; data = data->parent_data)
 #endif
 		if (data->chip && data->chip->irq_compose_msi_msg)
 			pos = data;

Though the for loop in a #ifdef in admittedly an acquired taste...

Reviewed-by: Marc Zyngier <maz@kernel.org>

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* Re: [patch V2 19/46] x86/msi: Use generic MSI domain ops
  2020-08-26 11:16 ` [patch V2 19/46] x86/msi: Use generic MSI domain ops Thomas Gleixner
@ 2020-08-26 20:21   ` Marc Zyngier
  2020-08-26 20:43     ` Thomas Gleixner
  0 siblings, 1 reply; 112+ messages in thread
From: Marc Zyngier @ 2020-08-26 20:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Greg Kroah-Hartman, Rafael J. Wysocki,
	Megha Dey, Jason Gunthorpe, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On Wed, 26 Aug 2020 12:16:47 +0100,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> pci_msi_get_hwirq() and pci_msi_set_desc are not longer special. Enable the
> generic MSI domain ops in the core and PCI MSI code unconditionally and get
> rid of the x86 specific implementations in the X86 MSI code and in the
> hyperv PCI driver.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> 
> ---
>  arch/x86/include/asm/msi.h          |    2 --
>  arch/x86/kernel/apic/msi.c          |   15 ---------------
>  drivers/pci/controller/pci-hyperv.c |    8 --------
>  drivers/pci/msi.c                   |    4 ----
>  kernel/irq/msi.c                    |    6 ------
>  5 files changed, 35 deletions(-)
> 
> --- a/arch/x86/include/asm/msi.h
> +++ b/arch/x86/include/asm/msi.h
> @@ -9,6 +9,4 @@ typedef struct irq_alloc_info msi_alloc_
>  int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
>  		    msi_alloc_info_t *arg);
>  
> -void pci_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc);
> -
>  #endif /* _ASM_X86_MSI_H */
> --- a/arch/x86/kernel/apic/msi.c
> +++ b/arch/x86/kernel/apic/msi.c
> @@ -204,12 +204,6 @@ void native_teardown_msi_irq(unsigned in
>  	irq_domain_free_irqs(irq, 1);
>  }
>  
> -static irq_hw_number_t pci_msi_get_hwirq(struct msi_domain_info *info,
> -					 msi_alloc_info_t *arg)
> -{
> -	return arg->hwirq;
> -}
> -
>  int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
>  		    msi_alloc_info_t *arg)
>  {
> @@ -228,17 +222,8 @@ int pci_msi_prepare(struct irq_domain *d
>  }
>  EXPORT_SYMBOL_GPL(pci_msi_prepare);
>  
> -void pci_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
> -{
> -	arg->desc = desc;
> -	arg->hwirq = pci_msi_domain_calc_hwirq(desc);
> -}
> -EXPORT_SYMBOL_GPL(pci_msi_set_desc);

I think that at this stage, pci_msi_domain_calc_hwirq() can be made
static, as it was only ever exported for this call site. Nice cleanup!

Reviewed-by: Marc Zyngier <maz@kernel.org>

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 17/46] PCI/MSI: Rework pci_msi_domain_calc_hwirq()
  2020-08-26 11:16 ` [patch V2 17/46] PCI/MSI: Rework pci_msi_domain_calc_hwirq() Thomas Gleixner
@ 2020-08-26 20:24   ` Marc Zyngier
  0 siblings, 0 replies; 112+ messages in thread
From: Marc Zyngier @ 2020-08-26 20:24 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Greg Kroah-Hartman, Rafael J. Wysocki,
	Megha Dey, Jason Gunthorpe, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On Wed, 26 Aug 2020 12:16:45 +0100,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Retrieve the PCI device from the msi descriptor instead of doing so at the
> call sites.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>

Acked-by: Marc Zyngier <maz@kernel.org>

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 23/46] irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI
  2020-08-26 11:16 ` [patch V2 23/46] irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI Thomas Gleixner
@ 2020-08-26 20:42   ` Marc Zyngier
  2020-08-26 20:57     ` Derrick, Jonathan
  0 siblings, 1 reply; 112+ messages in thread
From: Marc Zyngier @ 2020-08-26 20:42 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Greg Kroah-Hartman, Rafael J. Wysocki,
	Megha Dey, Jason Gunthorpe, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On Wed, 26 Aug 2020 12:16:51 +0100,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> PCI devices behind a VMD bus are not subject to interrupt remapping, but
> the irq domain for VMD MSI cannot be distinguished from a regular PCI/MSI
> irq domain.
> 
> Add a new domain bus token and allow it in the bus token check in
> msi_check_reservation_mode() to keep the functionality the same once VMD
> uses this token.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> 
> ---
>  include/linux/irqdomain.h |    1 +
>  kernel/irq/msi.c          |    7 ++++++-
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> --- a/include/linux/irqdomain.h
> +++ b/include/linux/irqdomain.h
> @@ -84,6 +84,7 @@ enum irq_domain_bus_token {
>  	DOMAIN_BUS_FSL_MC_MSI,
>  	DOMAIN_BUS_TI_SCI_INTA_MSI,
>  	DOMAIN_BUS_WAKEUP,
> +	DOMAIN_BUS_VMD_MSI,
>  };
>  
>  /**
> --- a/kernel/irq/msi.c
> +++ b/kernel/irq/msi.c
> @@ -370,8 +370,13 @@ static bool msi_check_reservation_mode(s
>  {
>  	struct msi_desc *desc;
>  
> -	if (domain->bus_token != DOMAIN_BUS_PCI_MSI)
> +	switch(domain->bus_token) {
> +	case DOMAIN_BUS_PCI_MSI:
> +	case DOMAIN_BUS_VMD_MSI:
> +		break;
> +	default:
>  		return false;
> +	}
>  
>  	if (!(info->flags & MSI_FLAG_MUST_REACTIVATE))
>  		return false;

Acked-by: Marc Zyngier <maz@kernel.org>

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 19/46] x86/msi: Use generic MSI domain ops
  2020-08-26 20:21   ` Marc Zyngier
@ 2020-08-26 20:43     ` Thomas Gleixner
  0 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 20:43 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Greg Kroah-Hartman, Rafael J. Wysocki,
	Megha Dey, Jason Gunthorpe, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On Wed, Aug 26 2020 at 21:21, Marc Zyngier wrote:
> On Wed, 26 Aug 2020 12:16:47 +0100,
> Thomas Gleixner <tglx@linutronix.de> wrote:
>> -void pci_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
>> -{
>> -	arg->desc = desc;
>> -	arg->hwirq = pci_msi_domain_calc_hwirq(desc);
>> -}
>> -EXPORT_SYMBOL_GPL(pci_msi_set_desc);
>
> I think that at this stage, pci_msi_domain_calc_hwirq() can be made
> static, as it was only ever exported for this call site. Nice cleanup!

Doh indeed. Let me fix that.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 24/46] PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI
  2020-08-26 11:16 ` [patch V2 24/46] PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI Thomas Gleixner
@ 2020-08-26 20:47   ` Marc Zyngier
  2020-08-31 14:39   ` Jason Gunthorpe
  1 sibling, 0 replies; 112+ messages in thread
From: Marc Zyngier @ 2020-08-26 20:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Greg Kroah-Hartman, Rafael J. Wysocki,
	Megha Dey, Jason Gunthorpe, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On Wed, 26 Aug 2020 12:16:52 +0100,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Devices on the VMD bus use their own MSI irq domain, but it is not
> distinguishable from regular PCI/MSI irq domains. This is required
> to exclude VMD devices from getting the irq domain pointer set by
> interrupt remapping.
> 
> Override the default bus token.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>  drivers/pci/controller/vmd.c |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> --- a/drivers/pci/controller/vmd.c
> +++ b/drivers/pci/controller/vmd.c
> @@ -579,6 +579,12 @@ static int vmd_enable_domain(struct vmd_
>  		return -ENODEV;
>  	}
>  
> +	/*
> +	 * Override the irq domain bus token so the domain can be distinguished
> +	 * from a regular PCI/MSI domain.
> +	 */
> +	irq_domain_update_bus_token(vmd->irq_domain, DOMAIN_BUS_VMD_MSI);
> +

One day, we'll be able to set the token at domain creation time. In
the meantime,

Acked-by: Marc Zyngier <maz@kernel.org>

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 15/46] x86/irq: Consolidate DMAR irq allocation
  2020-08-26 18:32     ` Thomas Gleixner
@ 2020-08-26 20:50       ` Thomas Gleixner
  2020-08-28  0:12         ` Dey, Megha
  0 siblings, 1 reply; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 20:50 UTC (permalink / raw)
  To: Dey, Megha, LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jason Gunthorpe, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On Wed, Aug 26 2020 at 20:32, Thomas Gleixner wrote:
> On Wed, Aug 26 2020 at 09:50, Megha Dey wrote:
>>> @@ -329,15 +329,15 @@ static struct irq_chip dmar_msi_controll
>>>   static irq_hw_number_t dmar_msi_get_hwirq(struct msi_domain_info *info,
>>>   					  msi_alloc_info_t *arg)
>>>   {
>>> -	return arg->dmar_id;
>>> +	return arg->hwirq;
>>
>> Shouldn't this return the arg->devid which gets set in dmar_alloc_hwirq?
>
> Indeed.

But for simplicity we can set arg->hwirq to the dmar id right in the
alloc function and then once the generic ops are enabled remove the dmar
callback completely.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 23/46] irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI
  2020-08-26 20:42   ` Marc Zyngier
@ 2020-08-26 20:57     ` Derrick, Jonathan
  0 siblings, 0 replies; 112+ messages in thread
From: Derrick, Jonathan @ 2020-08-26 20:57 UTC (permalink / raw)
  To: maz, tglx
  Cc: Williams, Dan J, sivanich, wei.liu, haiyangz, Dey, Megha, Lu,
	Baolu, Jiang, Dave, kys, Tian, Kevin, jgross, jgg, sstabellini,
	linux-kernel, x86, rafael, xen-devel, iommu, bhelgaas, linux-pci,
	konrad.wilk, alex.williamson, steve.wahl, boris.ostrovsky,
	gregkh, rja, joro, sthemmin, Pan, Jacob jun, lorenzo.pieralisi,
	linux-hyperv, baolu.lu

On Wed, 2020-08-26 at 21:42 +0100, Marc Zyngier wrote:
> On Wed, 26 Aug 2020 12:16:51 +0100,
> Thomas Gleixner <tglx@linutronix.de> wrote:
> > From: Thomas Gleixner <tglx@linutronix.de>
> > 
> > PCI devices behind a VMD bus are not subject to interrupt remapping, but
> > the irq domain for VMD MSI cannot be distinguished from a regular PCI/MSI
> > irq domain.
> > 
> > Add a new domain bus token and allow it in the bus token check in
> > msi_check_reservation_mode() to keep the functionality the same once VMD
> > uses this token.
> > 
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > 
> > ---
> >  include/linux/irqdomain.h |    1 +
> >  kernel/irq/msi.c          |    7 ++++++-
> >  2 files changed, 7 insertions(+), 1 deletion(-)
> > 
> > --- a/include/linux/irqdomain.h
> > +++ b/include/linux/irqdomain.h
> > @@ -84,6 +84,7 @@ enum irq_domain_bus_token {
> >  	DOMAIN_BUS_FSL_MC_MSI,
> >  	DOMAIN_BUS_TI_SCI_INTA_MSI,
> >  	DOMAIN_BUS_WAKEUP,
> > +	DOMAIN_BUS_VMD_MSI,
> >  };
> >  
> >  /**
> > --- a/kernel/irq/msi.c
> > +++ b/kernel/irq/msi.c
> > @@ -370,8 +370,13 @@ static bool msi_check_reservation_mode(s
> >  {
> >  	struct msi_desc *desc;
> >  
> > -	if (domain->bus_token != DOMAIN_BUS_PCI_MSI)
> > +	switch(domain->bus_token) {
> > +	case DOMAIN_BUS_PCI_MSI:
> > +	case DOMAIN_BUS_VMD_MSI:
> > +		break;
> > +	default:
> >  		return false;
> > +	}
> >  
> >  	if (!(info->flags & MSI_FLAG_MUST_REACTIVATE))
> >  		return false;
> 
> Acked-by: Marc Zyngier <maz@kernel.org>
> 
> 	M.
> 

Acked-by: Jon Derrick <jonathan.derrick@intel.com>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
  2020-08-26 11:17 ` [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable Thomas Gleixner
  2020-08-26 15:53   ` Thomas Gleixner
@ 2020-08-26 21:14   ` Marc Zyngier
  2020-08-26 21:27     ` Thomas Gleixner
  2020-08-27 18:20   ` Bjorn Helgaas
  2020-09-25 13:54   ` Qian Cai
  3 siblings, 1 reply; 112+ messages in thread
From: Marc Zyngier @ 2020-08-26 21:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Greg Kroah-Hartman, Rafael J. Wysocki,
	Megha Dey, Jason Gunthorpe, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On Wed, 26 Aug 2020 12:17:02 +0100,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The arch_.*_msi_irq[s] fallbacks are compiled in whether an architecture
> requires them or not. Architectures which are fully utilizing hierarchical
> irq domains should never call into that code.
> 
> It's not only architectures which depend on that by implementing one or
> more of the weak functions, there is also a bunch of drivers which relies
> on the weak functions which invoke msi_controller::setup_irq[s] and
> msi_controller::teardown_irq.
> 
> Make the architectures and drivers which rely on them select them in Kconfig
> and if not selected replace them by stub functions which emit a warning and
> fail the PCI/MSI interrupt allocation.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> V2: Make the architectures (and drivers) which need the fallbacks select them
>     and not the other way round (Bjorn).
> ---
>  arch/ia64/Kconfig              |    1 +
>  arch/mips/Kconfig              |    1 +
>  arch/powerpc/Kconfig           |    1 +
>  arch/s390/Kconfig              |    1 +
>  arch/sparc/Kconfig             |    1 +
>  arch/x86/Kconfig               |    1 +
>  drivers/pci/Kconfig            |    3 +++
>  drivers/pci/controller/Kconfig |    3 +++
>  drivers/pci/msi.c              |    3 ++-
>  include/linux/msi.h            |   31 ++++++++++++++++++++++++++-----
>  10 files changed, 40 insertions(+), 6 deletions(-)
> 

[...]

> --- a/drivers/pci/controller/Kconfig
> +++ b/drivers/pci/controller/Kconfig
> @@ -41,6 +41,7 @@ config PCI_TEGRA
>  	bool "NVIDIA Tegra PCIe controller"
>  	depends on ARCH_TEGRA || COMPILE_TEST
>  	depends on PCI_MSI_IRQ_DOMAIN
> +	select PCI_MSI_ARCH_FALLBACKS
>  	help
>  	  Say Y here if you want support for the PCIe host controller found
>  	  on NVIDIA Tegra SoCs.
> @@ -67,6 +68,7 @@ config PCIE_RCAR_HOST
>  	bool "Renesas R-Car PCIe host controller"
>  	depends on ARCH_RENESAS || COMPILE_TEST
>  	depends on PCI_MSI_IRQ_DOMAIN
> +	select PCI_MSI_ARCH_FALLBACKS
>  	help
>  	  Say Y here if you want PCIe controller support on R-Car SoCs in host
>  	  mode.
> @@ -103,6 +105,7 @@ config PCIE_XILINX_CPM
>  	bool "Xilinx Versal CPM host bridge support"
>  	depends on ARCH_ZYNQMP || COMPILE_TEST
>  	select PCI_HOST_COMMON
> +	select PCI_MSI_ARCH_FALLBACKS

This guy actually doesn't implement MSIs at all (it seems to delegate
them to an ITS present in the system, if I read the DT binding
correctly). However its older brother from the same silicon dealer
seems to need it. The patchlet below should fix it.

diff --git a/drivers/pci/controller/Kconfig b/drivers/pci/controller/Kconfig
index 9ad13919bcaa..f56ff049d469 100644
--- a/drivers/pci/controller/Kconfig
+++ b/drivers/pci/controller/Kconfig
@@ -96,6 +96,7 @@ config PCI_HOST_GENERIC
 
 config PCIE_XILINX
 	bool "Xilinx AXI PCIe host bridge support"
+	select PCI_MSI_ARCH_FALLBACKS
 	depends on OF || COMPILE_TEST
 	help
 	  Say 'Y' here if you want kernel to support the Xilinx AXI PCIe
@@ -105,7 +106,6 @@ config PCIE_XILINX_CPM
 	bool "Xilinx Versal CPM host bridge support"
 	depends on ARCH_ZYNQMP || COMPILE_TEST
 	select PCI_HOST_COMMON
-	select PCI_MSI_ARCH_FALLBACKS
 	help
 	  Say 'Y' here if you want kernel support for the
 	  Xilinx Versal CPM host bridge.


With that fixed,

Acked-by: Marc Zyngier <maz@kernel.org>

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* Re: [patch V2 04/46] genirq/chip: Use the first chip in irq_chip_compose_msi_msg()
  2020-08-26 19:50   ` Marc Zyngier
@ 2020-08-26 21:19     ` Thomas Gleixner
  2020-08-26 21:32       ` Marc Zyngier
  0 siblings, 1 reply; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 21:19 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Greg Kroah-Hartman, Rafael J. Wysocki,
	Megha Dey, Jason Gunthorpe, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On Wed, Aug 26 2020 at 20:50, Marc Zyngier wrote:
> On Wed, 26 Aug 2020 12:16:32 +0100,
> Thomas Gleixner <tglx@linutronix.de> wrote:
>> ---
>> V2: New patch. Note, that this might break other stuff which relies on the
>>     current behaviour, but the hierarchy composition of DT based chips is
>>     really hard to follow.
>
> Grepping around, I don't think there is any occurrence of two irqchips
> providing irq_compose_msi() that can share a hierarchy on any real
> system, so we should be fine. Famous last words.

Knocking on wood :)

>>  #ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
>> -	for (; data; data = data->parent_data)
>> -#endif
>> -		if (data->chip && data->chip->irq_compose_msi_msg)
>> +	for (; data; data = data->parent_data) {
>> +		if (data->chip && data->chip->irq_compose_msi_msg) {
>>  			pos = data;
>> +			break;
>> +		}
>> +	}
>> +#else
>> +	if (data->chip && data->chip->irq_compose_msi_msg)
>> +		pos = data;
>> +#endif
>>  	if (!pos)
>>  		return -ENOSYS;
>
> Is it just me, or is this last change more complex than it ought to
> be?

Kinda.

> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> index 857f5f4c8098..25e18b73699c 100644
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -1544,7 +1544,7 @@ int irq_chip_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
>  	struct irq_data *pos = NULL;
>  
>  #ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
> -	for (; data; data = data->parent_data)
> +	for (; data && !pos; data = data->parent_data)
>  #endif
>  		if (data->chip && data->chip->irq_compose_msi_msg)
>  			pos = data;
>
> Though the for loop in a #ifdef in admittedly an acquired taste...

Checking !pos is simpler obviously. That doesn't make me hate the loop
in the #ifdef less. :)

What about the below?

Thanks,

        tglx
---
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -473,6 +473,15 @@ static inline void irq_domain_deactivate
 }
 #endif
 
+static inline struct irq_data *irqd_get_parent_data(struct irq_data *irqd)
+{
+#ifdef CONFIG_IRQ_DOMAIN_HIERARCHY
+	return irqd->parent_data;
+#else
+	return NULL;
+#endif
+}
+
 #ifdef CONFIG_GENERIC_IRQ_DEBUGFS
 #include <linux/debugfs.h>
 
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -1541,18 +1541,17 @@ EXPORT_SYMBOL_GPL(irq_chip_release_resou
  */
 int irq_chip_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
 {
-	struct irq_data *pos = NULL;
+	struct irq_data *pos;
 
-#ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
-	for (; data; data = data->parent_data)
-#endif
+	for (pos = NULL; !pos && data; data = irqd_get_parent_data(data)) {
 		if (data->chip && data->chip->irq_compose_msi_msg)
 			pos = data;
+	}
+
 	if (!pos)
 		return -ENOSYS;
 
 	pos->chip->irq_compose_msi_msg(pos, msg);
-
 	return 0;
 }
 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 41/46] platform-msi: Provide default irq_chip:: Ack
  2020-08-26 11:17 ` [patch V2 41/46] platform-msi: Provide default irq_chip:: Ack Thomas Gleixner
@ 2020-08-26 21:25   ` Marc Zyngier
  0 siblings, 0 replies; 112+ messages in thread
From: Marc Zyngier @ 2020-08-26 21:25 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Greg Kroah-Hartman, Rafael J. Wysocki,
	Megha Dey, Jason Gunthorpe, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On Wed, 26 Aug 2020 12:17:09 +0100,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> For the upcoming device MSI support it's required to have a default
> irq_chip::ack implementation (irq_chip_ack_parent) so the drivers do not
> need to care.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> 
> ---
>  drivers/base/platform-msi.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> --- a/drivers/base/platform-msi.c
> +++ b/drivers/base/platform-msi.c
> @@ -95,6 +95,8 @@ static void platform_msi_update_chip_ops
>  		chip->irq_mask = irq_chip_mask_parent;
>  	if (!chip->irq_unmask)
>  		chip->irq_unmask = irq_chip_unmask_parent;
> +	if (!chip->irq_ack)
> +		chip->irq_ack = irq_chip_ack_parent;
>  	if (!chip->irq_eoi)
>  		chip->irq_eoi = irq_chip_eoi_parent;
>  	if (!chip->irq_set_affinity)
> 
> 

Acked-by: Marc Zyngier <maz@kernel.org>

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
  2020-08-26 21:14   ` Marc Zyngier
@ 2020-08-26 21:27     ` Thomas Gleixner
  0 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-26 21:27 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Greg Kroah-Hartman, Rafael J. Wysocki,
	Megha Dey, Jason Gunthorpe, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On Wed, Aug 26 2020 at 22:14, Marc Zyngier wrote:
> On Wed, 26 Aug 2020 12:17:02 +0100,
> Thomas Gleixner <tglx@linutronix.de> wrote:
>> @@ -103,6 +105,7 @@ config PCIE_XILINX_CPM
>>  	bool "Xilinx Versal CPM host bridge support"
>>  	depends on ARCH_ZYNQMP || COMPILE_TEST
>>  	select PCI_HOST_COMMON
>> +	select PCI_MSI_ARCH_FALLBACKS
>
> This guy actually doesn't implement MSIs at all (it seems to delegate
> them to an ITS present in the system, if I read the DT binding
> correctly). However its older brother from the same silicon dealer
> seems to need it. The patchlet below should fix it.

Gah, at some point my eyes went squared and I lost track..


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 04/46] genirq/chip: Use the first chip in irq_chip_compose_msi_msg()
  2020-08-26 21:19     ` Thomas Gleixner
@ 2020-08-26 21:32       ` Marc Zyngier
  0 siblings, 0 replies; 112+ messages in thread
From: Marc Zyngier @ 2020-08-26 21:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Greg Kroah-Hartman, Rafael J. Wysocki,
	Megha Dey, Jason Gunthorpe, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On Wed, 26 Aug 2020 22:19:56 +0100,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Wed, Aug 26 2020 at 20:50, Marc Zyngier wrote:
> > On Wed, 26 Aug 2020 12:16:32 +0100,
> > Thomas Gleixner <tglx@linutronix.de> wrote:
> >> ---
> >> V2: New patch. Note, that this might break other stuff which relies on the
> >>     current behaviour, but the hierarchy composition of DT based chips is
> >>     really hard to follow.
> >

[...]

> What about the below?
> 
> Thanks,
> 
>         tglx
> ---
> --- a/kernel/irq/internals.h
> +++ b/kernel/irq/internals.h
> @@ -473,6 +473,15 @@ static inline void irq_domain_deactivate
>  }
>  #endif
>  
> +static inline struct irq_data *irqd_get_parent_data(struct irq_data *irqd)
> +{
> +#ifdef CONFIG_IRQ_DOMAIN_HIERARCHY
> +	return irqd->parent_data;
> +#else
> +	return NULL;
> +#endif
> +}
> +

We obviously should have had this forever.

>  #ifdef CONFIG_GENERIC_IRQ_DEBUGFS
>  #include <linux/debugfs.h>
>  
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -1541,18 +1541,17 @@ EXPORT_SYMBOL_GPL(irq_chip_release_resou
>   */
>  int irq_chip_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
>  {
> -	struct irq_data *pos = NULL;
> +	struct irq_data *pos;
>  
> -#ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
> -	for (; data; data = data->parent_data)
> -#endif
> +	for (pos = NULL; !pos && data; data = irqd_get_parent_data(data)) {
>  		if (data->chip && data->chip->irq_compose_msi_msg)
>  			pos = data;
> +	}
> +
>  	if (!pos)
>  		return -ENOSYS;
>  
>  	pos->chip->irq_compose_msi_msg(pos, msg);
> -
>  	return 0;
>  }

Perfect, ship it! ;-)

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 29/46] irqdomain/msi: Allow to override msi_domain_alloc/free_irqs()
  2020-08-26 19:47     ` Thomas Gleixner
@ 2020-08-26 21:33       ` Marc Zyngier
  0 siblings, 0 replies; 112+ messages in thread
From: Marc Zyngier @ 2020-08-26 21:33 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Greg Kroah-Hartman, Rafael J. Wysocki,
	Megha Dey, Jason Gunthorpe, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On Wed, 26 Aug 2020 20:47:38 +0100,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Wed, Aug 26 2020 at 20:06, Marc Zyngier wrote:
> > On Wed, 26 Aug 2020 12:16:57 +0100,
> > Thomas Gleixner <tglx@linutronix.de> wrote:
> >>  /**
> >> - * msi_domain_free_irqs - Free interrupts from a MSI interrupt @domain associated tp @dev
> >> - * @domain:	The domain to managing the interrupts
> >> + * msi_domain_alloc_irqs - Allocate interrupts from a MSI interrupt domain
> >> + * @domain:	The domain to allocate from
> >>   * @dev:	Pointer to device struct of the device for which the interrupts
> >> - *		are free
> >> + *		are allocated
> >> + * @nvec:	The number of interrupts to allocate
> >> + *
> >> + * Returns 0 on success or an error code.
> >>   */
> >> -void msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
> >> +int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
> >> +			  int nvec)
> >> +{
> >> +	struct msi_domain_info *info = domain->host_data;
> >> +	struct msi_domain_ops *ops = info->ops;
> >
> > Rework leftovers, I imagine.
> 
> Hmm, no. How would it call ops->domain_alloc_irqs() without getting the
> ops. I know, that the diff is horrible, but don't blame me for it. diff
> sucks at times.

I can't read. Time to put the laptop away!

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 27/46] x86/xen: Rework MSI teardown
  2020-08-26 11:16 ` [patch V2 27/46] x86/xen: Rework MSI teardown Thomas Gleixner
@ 2020-08-27  7:46   ` Jürgen Groß
  0 siblings, 0 replies; 112+ messages in thread
From: Jürgen Groß @ 2020-08-27  7:46 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On 26.08.20 13:16, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> X86 cannot store the irq domain pointer in struct device without breaking
> XEN because the irq domain pointer takes precedence over arch_*_msi_irqs()
> fallbacks.
> 
> XENs MSI teardown relies on default_teardown_msi_irqs() which invokes
> arch_teardown_msi_irq(). default_teardown_msi_irqs() is a trivial iterator
> over the msi entries associated to a device.
> 
> Implement this loop in xen_teardown_msi_irqs() to prepare for removal of
> the fallbacks for X86.
> 
> This is a preparatory step to wrap XEN MSI alloc/free into a irq domain
> which in turn allows to store the irq domain pointer in struct device and
> to use the irq domain functions directly.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 28/46] x86/xen: Consolidate XEN-MSI init
  2020-08-26 11:16 ` [patch V2 28/46] x86/xen: Consolidate XEN-MSI init Thomas Gleixner
@ 2020-08-27  7:47   ` Jürgen Groß
  0 siblings, 0 replies; 112+ messages in thread
From: Jürgen Groß @ 2020-08-27  7:47 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On 26.08.20 13:16, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> X86 cannot store the irq domain pointer in struct device without breaking
> XEN because the irq domain pointer takes precedence over arch_*_msi_irqs()
> fallbacks.
> 
> To achieve this XEN MSI interrupt management needs to be wrapped into an
> irq domain.
> 
> Move the x86_msi ops setup into a single function to prepare for this.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 43/46] genirq/msi: Provide and use msi_domain_set_default_info_flags()
  2020-08-26 11:17 ` [patch V2 43/46] genirq/msi: Provide and use msi_domain_set_default_info_flags() Thomas Gleixner
@ 2020-08-27  8:17   ` Marc Zyngier
  2020-08-28 18:42     ` Thomas Gleixner
  0 siblings, 1 reply; 112+ messages in thread
From: Marc Zyngier @ 2020-08-27  8:17 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Greg Kroah-Hartman, Rafael J. Wysocki,
	Megha Dey, Jason Gunthorpe, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On 2020-08-26 12:17, Thomas Gleixner wrote:
> MSI interrupts have some common flags which should be set not only for
> PCI/MSI interrupts.
> 
> Move the PCI/MSI flag setting into a common function so it can be 
> reused.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> V2: New patch
> ---
>  drivers/pci/msi.c   |    7 +------
>  include/linux/msi.h |    1 +
>  kernel/irq/msi.c    |   24 ++++++++++++++++++++++++
>  3 files changed, 26 insertions(+), 6 deletions(-)
> 
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -1469,12 +1469,7 @@ struct irq_domain *pci_msi_create_irq_do
>  	if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS)
>  		pci_msi_domain_update_chip_ops(info);
> 
> -	info->flags |= MSI_FLAG_ACTIVATE_EARLY;
> -	if (IS_ENABLED(CONFIG_GENERIC_IRQ_RESERVATION_MODE))
> -		info->flags |= MSI_FLAG_MUST_REACTIVATE;
> -
> -	/* PCI-MSI is oneshot-safe */
> -	info->chip->flags |= IRQCHIP_ONESHOT_SAFE;
> +	msi_domain_set_default_info_flags(info);
> 
>  	domain = msi_create_irq_domain(fwnode, info, parent);
>  	if (!domain)
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -410,6 +410,7 @@ int platform_msi_domain_alloc(struct irq
>  void platform_msi_domain_free(struct irq_domain *domain, unsigned int 
> virq,
>  			      unsigned int nvec);
>  void *platform_msi_get_host_data(struct irq_domain *domain);
> +void msi_domain_set_default_info_flags(struct msi_domain_info *info);
>  #endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */
> 
>  #ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
> --- a/kernel/irq/msi.c
> +++ b/kernel/irq/msi.c
> @@ -70,6 +70,30 @@ void get_cached_msi_msg(unsigned int irq
>  EXPORT_SYMBOL_GPL(get_cached_msi_msg);
> 
>  #ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
> +void msi_domain_set_default_info_flags(struct msi_domain_info *info)
> +{
> +	/* Required so that a device latches a valid MSI message on startup 
> */
> +	info->flags |= MSI_FLAG_ACTIVATE_EARLY;

As far as I remember the story behind this flag (it's been a while),
it was working around a PCI-specific issue, hence being located in
the PCI code.

Now, the "program the MSI before enabling it" concept makes sense no 
matter
what bus this is on, and I wonder why we are even keeping this flag 
around.
Can't we just drop it together with the check in 
msi_domain_alloc_irqs()?

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 01/46] iommu/amd: Prevent NULL pointer dereference
  2020-08-26 11:16 ` [patch V2 01/46] iommu/amd: Prevent NULL pointer dereference Thomas Gleixner
@ 2020-08-27 14:57   ` Joerg Roedel
  0 siblings, 0 replies; 112+ messages in thread
From: Joerg Roedel @ 2020-08-27 14:57 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, iommu, linux-hyperv, Haiyang Zhang, Jon Derrick,
	Lu Baolu, Wei Liu, K. Y. Srinivasan, Stephen Hemminger,
	Steve Wahl, Dimitri Sivanich, Russ Anderson, linux-pci,
	Bjorn Helgaas, Lorenzo Pieralisi, Konrad Rzeszutek Wilk,
	xen-devel, Juergen Gross, Boris Ostrovsky, Stefano Stabellini,
	Marc Zyngier, Greg Kroah-Hartman, Rafael J. Wysocki, Megha Dey,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Jacob Pan,
	Baolu Lu, Kevin Tian, Dan Williams

On Wed, Aug 26, 2020 at 01:16:29PM +0200, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Dereferencing irq_data before checking it for NULL is suboptimal.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Acked-by: Joerg Roedel <jroedel@suse.de>
Reviewed-by: Joerg Roedel <jroedel@suse.de>


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
  2020-08-26 11:17 ` [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable Thomas Gleixner
  2020-08-26 15:53   ` Thomas Gleixner
  2020-08-26 21:14   ` Marc Zyngier
@ 2020-08-27 18:20   ` Bjorn Helgaas
  2020-08-28 11:21     ` Lorenzo Pieralisi
  2020-08-28 18:29     ` Thomas Gleixner
  2020-09-25 13:54   ` Qian Cai
  3 siblings, 2 replies; 112+ messages in thread
From: Bjorn Helgaas @ 2020-08-27 18:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams,
	Rob Herring

[+cc Rob,
cover https://lore.kernel.org/r/20200826111628.794979401@linutronix.de/
this  https://lore.kernel.org/r/20200826112333.992429909@linutronix.de/]

On Wed, Aug 26, 2020 at 01:17:02PM +0200, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The arch_.*_msi_irq[s] fallbacks are compiled in whether an architecture
> requires them or not. Architectures which are fully utilizing hierarchical
> irq domains should never call into that code.
> 
> It's not only architectures which depend on that by implementing one or
> more of the weak functions, there is also a bunch of drivers which relies
> on the weak functions which invoke msi_controller::setup_irq[s] and
> msi_controller::teardown_irq.
> 
> Make the architectures and drivers which rely on them select them in Kconfig
> and if not selected replace them by stub functions which emit a warning and
> fail the PCI/MSI interrupt allocation.

Sorry, I really don't understand this, so these are probably stupid
questions.

If CONFIG_PCI_MSI_ARCH_FALLBACKS is defined, we will supply
implementations of:

  arch_setup_msi_irq
  arch_teardown_msi_irq
  arch_setup_msi_irqs
  arch_teardown_msi_irqs
  default_teardown_msi_irqs    # non-weak

You select CONFIG_PCI_MSI_ARCH_FALLBACKS for ia64, mips, powerpc,
s390, sparc, and x86.  I see that all of those arches implement at
least one of the functions above.  But x86 doesn't and I can't figure
out why it needs to select CONFIG_PCI_MSI_ARCH_FALLBACKS.

I assume there's a way to convert these arches to hierarchical irq
domains so they wouldn't need this at all?  Is there a sample
conversion to look at?

And I can't figure out what's special about tegra, rcar, and xilinx
that makes them need it as well.  Is there something I could grep for
to identify them?  Is there a way to convert them so they don't need
it?

> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -193,17 +193,38 @@ void pci_msi_mask_irq(struct irq_data *d
>  void pci_msi_unmask_irq(struct irq_data *data);
>  
>  /*
> - * The arch hooks to setup up msi irqs. Those functions are
> - * implemented as weak symbols so that they /can/ be overriden by
> - * architecture specific code if needed.
> + * The arch hooks to setup up msi irqs. Default functions are implemented

s/msi/MSI/ to match the one below.

> + * as weak symbols so that they /can/ be overriden by architecture specific
> + * code if needed. These hooks must be enabled by the architecture or by
> + * drivers which depend on them via msi_controller based MSI handling.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 15/46] x86/irq: Consolidate DMAR irq allocation
  2020-08-26 20:50       ` Thomas Gleixner
@ 2020-08-28  0:12         ` Dey, Megha
  0 siblings, 0 replies; 112+ messages in thread
From: Dey, Megha @ 2020-08-28  0:12 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jason Gunthorpe, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

Hi Thomas,

On 8/26/2020 1:50 PM, Thomas Gleixner wrote:
> On Wed, Aug 26 2020 at 20:32, Thomas Gleixner wrote:
>> On Wed, Aug 26 2020 at 09:50, Megha Dey wrote:
>>>> @@ -329,15 +329,15 @@ static struct irq_chip dmar_msi_controll
>>>>    static irq_hw_number_t dmar_msi_get_hwirq(struct msi_domain_info *info,
>>>>    					  msi_alloc_info_t *arg)
>>>>    {
>>>> -	return arg->dmar_id;
>>>> +	return arg->hwirq;
>>> Shouldn't this return the arg->devid which gets set in dmar_alloc_hwirq?
>> Indeed.
> But for simplicity we can set arg->hwirq to the dmar id right in the
> alloc function and then once the generic ops are enabled remove the dmar
> callback completely
True, can get rid of more code that way.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 29/46] irqdomain/msi: Allow to override msi_domain_alloc/free_irqs()
  2020-08-26 11:16 ` [patch V2 29/46] irqdomain/msi: Allow to override msi_domain_alloc/free_irqs() Thomas Gleixner
  2020-08-26 19:06   ` Marc Zyngier
@ 2020-08-28  0:24   ` Dey, Megha
  1 sibling, 0 replies; 112+ messages in thread
From: Dey, Megha @ 2020-08-28  0:24 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jason Gunthorpe, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

Hi Thomas,

On 8/26/2020 4:16 AM, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
>
> To support MSI irq domains which do not fit at all into the regular MSI
> irqdomain scheme, like the XEN MSI interrupt management for PV/HVM/DOM0,
> it's necessary to allow to override the alloc/free implementation.
>
> This is a preperatory step to switch X86 away from arch_*_msi_irqs() and
> store the irq domain pointer right in struct device.
>
> No functional change for existing MSI irq domain users.
>
> Aside of the evil XEN wrapper this is also useful for special MSI domains
> which need to do extra alloc/free work before/after calling the generic
> core function. Work like allocating/freeing MSI descriptors, MSI storage
> space etc.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>
> ---
>   include/linux/msi.h |   27 ++++++++++++++++++++
>   kernel/irq/msi.c    |   70 +++++++++++++++++++++++++++++++++++-----------------
>   2 files changed, 75 insertions(+), 22 deletions(-)
>
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -241,6 +241,10 @@ struct msi_domain_info;
>    * @msi_finish:		Optional callback to finalize the allocation
>    * @set_desc:		Set the msi descriptor for an interrupt
>    * @handle_error:	Optional error handler if the allocation fails
> + * @domain_alloc_irqs:	Optional function to override the default allocation
> + *			function.
> + * @domain_free_irqs:	Optional function to override the default free
> + *			function.
>    *
>    * @get_hwirq, @msi_init and @msi_free are callbacks used by
>    * msi_create_irq_domain() and related interfaces
> @@ -248,6 +252,22 @@ struct msi_domain_info;
>    * @msi_check, @msi_prepare, @msi_finish, @set_desc and @handle_error
>    * are callbacks used by msi_domain_alloc_irqs() and related
>    * interfaces which are based on msi_desc.
> + *
> + * @domain_alloc_irqs, @domain_free_irqs can be used to override the
> + * default allocation/free functions (__msi_domain_alloc/free_irqs). This
> + * is initially for a wrapper around XENs seperate MSI universe which can't
> + * be wrapped into the regular irq domains concepts by mere mortals.  This
> + * allows to universally use msi_domain_alloc/free_irqs without having to
> + * special case XEN all over the place.
> + *
> + * Contrary to other operations @domain_alloc_irqs and @domain_free_irqs
> + * are set to the default implementation if NULL and even when
> + * MSI_FLAG_USE_DEF_DOM_OPS is not set to avoid breaking existing users and
> + * because these callbacks are obviously mandatory.
> + *
> + * This is NOT meant to be abused, but it can be useful to build wrappers
> + * for specialized MSI irq domains which need extra work before and after
> + * calling __msi_domain_alloc_irqs()/__msi_domain_free_irqs().
>    */
>   struct msi_domain_ops {
>   	irq_hw_number_t	(*get_hwirq)(struct msi_domain_info *info,
> @@ -270,6 +290,10 @@ struct msi_domain_ops {
>   				    struct msi_desc *desc);
>   	int		(*handle_error)(struct irq_domain *domain,
>   					struct msi_desc *desc, int error);
> +	int		(*domain_alloc_irqs)(struct irq_domain *domain,
> +					     struct device *dev, int nvec);
> +	void		(*domain_free_irqs)(struct irq_domain *domain,
> +					    struct device *dev);
>   };
>   
>   /**
> @@ -327,8 +351,11 @@ int msi_domain_set_affinity(struct irq_d
>   struct irq_domain *msi_create_irq_domain(struct fwnode_handle *fwnode,
>   					 struct msi_domain_info *info,
>   					 struct irq_domain *parent);
> +int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
> +			    int nvec);
>   int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
>   			  int nvec);
> +void __msi_domain_free_irqs(struct irq_domain *domain, struct device *dev);
>   void msi_domain_free_irqs(struct irq_domain *domain, struct device *dev);
>   struct msi_domain_info *msi_get_domain_info(struct irq_domain *domain);
>   
> --- a/kernel/irq/msi.c
> +++ b/kernel/irq/msi.c
> @@ -229,11 +229,13 @@ static int msi_domain_ops_check(struct i
>   }
>   
>   static struct msi_domain_ops msi_domain_ops_default = {
> -	.get_hwirq	= msi_domain_ops_get_hwirq,
> -	.msi_init	= msi_domain_ops_init,
> -	.msi_check	= msi_domain_ops_check,
> -	.msi_prepare	= msi_domain_ops_prepare,
> -	.set_desc	= msi_domain_ops_set_desc,
> +	.get_hwirq		= msi_domain_ops_get_hwirq,
> +	.msi_init		= msi_domain_ops_init,
> +	.msi_check		= msi_domain_ops_check,
> +	.msi_prepare		= msi_domain_ops_prepare,
> +	.set_desc		= msi_domain_ops_set_desc,
> +	.domain_alloc_irqs	= __msi_domain_alloc_irqs,
> +	.domain_free_irqs	= __msi_domain_free_irqs,
>   };
>   
>   static void msi_domain_update_dom_ops(struct msi_domain_info *info)
> @@ -245,6 +247,14 @@ static void msi_domain_update_dom_ops(st
>   		return;
>   	}
>   
> +	if (ops->domain_alloc_irqs == NULL)
> +		ops->domain_alloc_irqs = msi_domain_ops_default.domain_alloc_irqs;
> +	if (ops->domain_free_irqs == NULL)
> +		ops->domain_free_irqs = msi_domain_ops_default.domain_free_irqs;
> +
> +	if (!(info->flags & MSI_FLAG_USE_DEF_DOM_OPS))
> +		return;
> +
>   	if (ops->get_hwirq == NULL)
>   		ops->get_hwirq = msi_domain_ops_default.get_hwirq;
>   	if (ops->msi_init == NULL)
> @@ -278,8 +288,7 @@ struct irq_domain *msi_create_irq_domain
>   {
>   	struct irq_domain *domain;
>   
> -	if (info->flags & MSI_FLAG_USE_DEF_DOM_OPS)
> -		msi_domain_update_dom_ops(info);
> +	msi_domain_update_dom_ops(info);
>   	if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS)
>   		msi_domain_update_chip_ops(info);
>   
> @@ -386,17 +395,8 @@ static bool msi_check_reservation_mode(s
>   	return desc->msi_attrib.is_msix || desc->msi_attrib.maskbit;
>   }
>   
> -/**
> - * msi_domain_alloc_irqs - Allocate interrupts from a MSI interrupt domain
> - * @domain:	The domain to allocate from
> - * @dev:	Pointer to device struct of the device for which the interrupts
> - *		are allocated
> - * @nvec:	The number of interrupts to allocate
> - *
> - * Returns 0 on success or an error code.
> - */
> -int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
> -			  int nvec)
> +int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
> +			    int nvec)
>   {
>   	struct msi_domain_info *info = domain->host_data;
>   	struct msi_domain_ops *ops = info->ops;
> @@ -490,12 +490,24 @@ int msi_domain_alloc_irqs(struct irq_dom
>   }
>   
>   /**
> - * msi_domain_free_irqs - Free interrupts from a MSI interrupt @domain associated tp @dev
> - * @domain:	The domain to managing the interrupts
> + * msi_domain_alloc_irqs - Allocate interrupts from a MSI interrupt domain
> + * @domain:	The domain to allocate from
>    * @dev:	Pointer to device struct of the device for which the interrupts
> - *		are free
> + *		are allocated
> + * @nvec:	The number of interrupts to allocate
> + *
> + * Returns 0 on success or an error code.
>    */
> -void msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
> +int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
> +			  int nvec)
> +{
> +	struct msi_domain_info *info = domain->host_data;
> +	struct msi_domain_ops *ops = info->ops;
> +
> +	return ops->domain_alloc_irqs(domain, dev, nvec);
> +}
> +

Since, our upcoming driver will directly call this API, an 
EXPORT_SYMBOL_GPL tag would be required.

Currently there is no use case. I was wondering if we should add this 
change while submitting the

idxd/ims patches or would you add this to this patch?

> +void __msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
>   {
>   	struct msi_desc *desc;
>   
> @@ -513,6 +525,20 @@ void msi_domain_free_irqs(struct irq_dom
>   }
>   
>   /**
> + * __msi_domain_free_irqs - Free interrupts from a MSI interrupt @domain associated tp @dev
> + * @domain:	The domain to managing the interrupts
> + * @dev:	Pointer to device struct of the device for which the interrupts
> + *		are free
> + */
> +void msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
> +{
> +	struct msi_domain_info *info = domain->host_data;
> +	struct msi_domain_ops *ops = info->ops;
> +
> +	return ops->domain_free_irqs(domain, dev);
> +}
> +
> +/**
>    * msi_get_domain_info - Get the MSI interrupt domain info for @domain
>    * @domain:	The interrupt domain to retrieve data from
>    *
>
>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
  2020-08-27 18:20   ` Bjorn Helgaas
@ 2020-08-28 11:21     ` Lorenzo Pieralisi
  2020-08-28 12:19       ` Jason Gunthorpe
  2020-08-28 18:29     ` Thomas Gleixner
  1 sibling, 1 reply; 112+ messages in thread
From: Lorenzo Pieralisi @ 2020-08-28 11:21 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Thomas Gleixner, LKML, x86, Joerg Roedel, iommu, linux-hyperv,
	Haiyang Zhang, Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Konrad Rzeszutek Wilk, xen-devel,
	Juergen Gross, Boris Ostrovsky, Stefano Stabellini, Marc Zyngier,
	Greg Kroah-Hartman, Rafael J. Wysocki, Megha Dey,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Jacob Pan,
	Baolu Lu, Kevin Tian, Dan Williams, Rob Herring

On Thu, Aug 27, 2020 at 01:20:40PM -0500, Bjorn Helgaas wrote:

[...]

> And I can't figure out what's special about tegra, rcar, and xilinx
> that makes them need it as well.  Is there something I could grep for
> to identify them?  Is there a way to convert them so they don't need
> it?

I think DT binding and related firmware support are needed to setup the
MSI IRQ domains correctly, there is nothing special about tegra, rcar
and xilinx AFAIK (well, all native host controllers MSI handling is
*special* just to be polite but let's gloss over this for the time
being).

struct msi_controller, to answer the first question.

I have doubts about pci_mvebu too, they do allocate an msi_controller
but without methods so it looks pretty much useless.

Hyper-V code too seems questionable, maybe there is room for more
clean-ups.

Lorenzo

> > --- a/include/linux/msi.h
> > +++ b/include/linux/msi.h
> > @@ -193,17 +193,38 @@ void pci_msi_mask_irq(struct irq_data *d
> >  void pci_msi_unmask_irq(struct irq_data *data);
> >  
> >  /*
> > - * The arch hooks to setup up msi irqs. Those functions are
> > - * implemented as weak symbols so that they /can/ be overriden by
> > - * architecture specific code if needed.
> > + * The arch hooks to setup up msi irqs. Default functions are implemented
> 
> s/msi/MSI/ to match the one below.
> 
> > + * as weak symbols so that they /can/ be overriden by architecture specific
> > + * code if needed. These hooks must be enabled by the architecture or by
> > + * drivers which depend on them via msi_controller based MSI handling.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (45 preceding siblings ...)
  2020-08-26 11:17 ` [patch V2 46/46] irqchip: Add IMS (Interrupt Message Storm) driver - NOT FOR MERGING Thomas Gleixner
@ 2020-08-28 11:41 ` Joerg Roedel
  2020-08-31  0:51 ` Lu Baolu
                   ` (6 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Joerg Roedel @ 2020-08-28 11:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, iommu, linux-hyperv, Haiyang Zhang, Jon Derrick,
	Lu Baolu, Wei Liu, K. Y. Srinivasan, Stephen Hemminger,
	Steve Wahl, Dimitri Sivanich, Russ Anderson, linux-pci,
	Bjorn Helgaas, Lorenzo Pieralisi, Konrad Rzeszutek Wilk,
	xen-devel, Juergen Gross, Boris Ostrovsky, Stefano Stabellini,
	Marc Zyngier, Greg Kroah-Hartman, Rafael J. Wysocki, Megha Dey,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Jacob Pan,
	Baolu Lu, Kevin Tian, Dan Williams

On Wed, Aug 26, 2020 at 01:16:28PM +0200, Thomas Gleixner wrote:
> This is the second version of providing a base to support device MSI (non
> PCI based) and on top of that support for IMS (Interrupt Message Storm)
> based devices in a halfways architecture independent way.
> 
> The first version can be found here:
> 
>     https://lore.kernel.org/r/20200821002424.119492231@linutronix.de
> 
> It's still a mixed bag of bug fixes, cleanups and general improvements
> which are worthwhile independent of device MSI.
> 
> There are quite a bunch of issues to solve:
> 
>   - X86 does not use the device::msi_domain pointer for historical reasons
>     and due to XEN, which makes it impossible to create an architecture
>     agnostic device MSI infrastructure.
> 
>   - X86 has it's own msi_alloc_info data type which is pointlessly
>     different from the generic version and does not allow to share code.
> 
>   - The logic of composing MSI messages in an hierarchy is busted at the
>     core level and of course some (x86) drivers depend on that.
> 
>   - A few minor shortcomings as usual
> 
> This series addresses that in several steps:

For all IOMMU changes:

	Acked-by: Joerg Roedel <jroedel@suse.de>


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
  2020-08-28 11:21     ` Lorenzo Pieralisi
@ 2020-08-28 12:19       ` Jason Gunthorpe
  2020-08-28 12:47         ` Marc Zyngier
  0 siblings, 1 reply; 112+ messages in thread
From: Jason Gunthorpe @ 2020-08-28 12:19 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Bjorn Helgaas, Thomas Gleixner, LKML, x86, Joerg Roedel, iommu,
	linux-hyperv, Haiyang Zhang, Jon Derrick, Lu Baolu, Wei Liu,
	K. Y. Srinivasan, Stephen Hemminger, Steve Wahl,
	Dimitri Sivanich, Russ Anderson, linux-pci, Bjorn Helgaas,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams, Rob Herring

On Fri, Aug 28, 2020 at 12:21:42PM +0100, Lorenzo Pieralisi wrote:
> On Thu, Aug 27, 2020 at 01:20:40PM -0500, Bjorn Helgaas wrote:
> 
> [...]
> 
> > And I can't figure out what's special about tegra, rcar, and xilinx
> > that makes them need it as well.  Is there something I could grep for
> > to identify them?  Is there a way to convert them so they don't need
> > it?
> 
> I think DT binding and related firmware support are needed to setup the
> MSI IRQ domains correctly, there is nothing special about tegra, rcar
> and xilinx AFAIK (well, all native host controllers MSI handling is
> *special* just to be polite but let's gloss over this for the time
> being).
> 
> struct msi_controller, to answer the first question.
> 
> I have doubts about pci_mvebu too, they do allocate an msi_controller
> but without methods so it looks pretty much useless.

Oh, I did once know things about mvebu.. 

I suspect the msi controller pointer assignment is dead code at this
point. The only implementation of MSI with that PCI root port is
drivers/irqchip/irq-armada-370-xp.c which looks like it uses
irq_domain.

Actually looks like things are very close to eliminating
msi_controller.

This is dead code, can't find a setter for hw_pci->msi_ctrl:

arch/arm/include/asm/mach/pci.h:        struct msi_controller *msi_ctrl;
arch/arm/kernel/bios32.c:                               bridge->msi = hw->msi_ctrl;

This is probably just copying NULL from one place to another:

drivers/pci/controller/pci-mvebu.c:     struct msi_controller *msi;

These need conversion to irq_domain (right?):

drivers/pci/controller/pci-hyperv.c:    struct msi_controller msi_chip;
drivers/pci/controller/pci-tegra.c:     struct msi_controller chip;
drivers/pci/controller/pcie-rcar-host.c:        struct msi_controller chip;
drivers/pci/controller/pcie-xilinx.c:static struct msi_controller xilinx_pcie_msi_chip = {

Then the stuff in drivers/pci/msi.c can go away.

So the arch_setup_msi_irq/etc is not really an arch hook, but some
infrastructure to support those 4 PCI root port drivers.

Jason

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
  2020-08-28 12:19       ` Jason Gunthorpe
@ 2020-08-28 12:47         ` Marc Zyngier
  2020-08-28 12:54           ` Jason Gunthorpe
  0 siblings, 1 reply; 112+ messages in thread
From: Marc Zyngier @ 2020-08-28 12:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Lorenzo Pieralisi, Bjorn Helgaas, Thomas Gleixner, LKML, x86,
	Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang, Jon Derrick,
	Lu Baolu, Wei Liu, K. Y. Srinivasan, Stephen Hemminger,
	Steve Wahl, Dimitri Sivanich, Russ Anderson, linux-pci,
	Bjorn Helgaas, Konrad Rzeszutek Wilk, xen-devel, Juergen Gross,
	Boris Ostrovsky, Stefano Stabellini, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams, Rob Herring

Hi Jason,

On 2020-08-28 13:19, Jason Gunthorpe wrote:
> On Fri, Aug 28, 2020 at 12:21:42PM +0100, Lorenzo Pieralisi wrote:
>> On Thu, Aug 27, 2020 at 01:20:40PM -0500, Bjorn Helgaas wrote:
>> 
>> [...]
>> 
>> > And I can't figure out what's special about tegra, rcar, and xilinx
>> > that makes them need it as well.  Is there something I could grep for
>> > to identify them?  Is there a way to convert them so they don't need
>> > it?
>> 
>> I think DT binding and related firmware support are needed to setup 
>> the
>> MSI IRQ domains correctly, there is nothing special about tegra, rcar
>> and xilinx AFAIK (well, all native host controllers MSI handling is
>> *special* just to be polite but let's gloss over this for the time
>> being).
>> 
>> struct msi_controller, to answer the first question.
>> 
>> I have doubts about pci_mvebu too, they do allocate an msi_controller
>> but without methods so it looks pretty much useless.
> 
> Oh, I did once know things about mvebu..
> 
> I suspect the msi controller pointer assignment is dead code at this
> point. The only implementation of MSI with that PCI root port is
> drivers/irqchip/irq-armada-370-xp.c which looks like it uses
> irq_domain.
> 
> Actually looks like things are very close to eliminating
> msi_controller.
> 
> This is dead code, can't find a setter for hw_pci->msi_ctrl:
> 
> arch/arm/include/asm/mach/pci.h:        struct msi_controller 
> *msi_ctrl;
> arch/arm/kernel/bios32.c:                               bridge->msi =
> hw->msi_ctrl;
> 
> This is probably just copying NULL from one place to another:
> 
> drivers/pci/controller/pci-mvebu.c:     struct msi_controller *msi;
> 
> These need conversion to irq_domain (right?):
> 
> drivers/pci/controller/pci-hyperv.c:    struct msi_controller msi_chip;
> drivers/pci/controller/pci-tegra.c:     struct msi_controller chip;
> drivers/pci/controller/pcie-rcar-host.c:        struct msi_controller 
> chip;
> drivers/pci/controller/pcie-xilinx.c:static struct msi_controller
> xilinx_pcie_msi_chip = {
> 
> Then the stuff in drivers/pci/msi.c can go away.
> 
> So the arch_setup_msi_irq/etc is not really an arch hook, but some
> infrastructure to support those 4 PCI root port drivers.

I happen to have a *really old* patch addressing Tegra [1], which
I was never able to test (no HW). Rebasing it shouldn't be too hard,
and maybe you can find someone internally willing to give it a spin?

That'd be a good start towards the removal of this cruft.

Thanks,

         M.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=irq/kill-msi-controller&id=83b3602fcee7972b9d549ed729b56ec28de16081
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
  2020-08-28 12:47         ` Marc Zyngier
@ 2020-08-28 12:54           ` Jason Gunthorpe
  2020-08-28 13:52             ` Marc Zyngier
  0 siblings, 1 reply; 112+ messages in thread
From: Jason Gunthorpe @ 2020-08-28 12:54 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Lorenzo Pieralisi, Bjorn Helgaas, Thomas Gleixner, LKML, x86,
	Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang, Jon Derrick,
	Lu Baolu, Wei Liu, K. Y. Srinivasan, Stephen Hemminger,
	Steve Wahl, Dimitri Sivanich, Russ Anderson, linux-pci,
	Bjorn Helgaas, Konrad Rzeszutek Wilk, xen-devel, Juergen Gross,
	Boris Ostrovsky, Stefano Stabellini, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams, Rob Herring

On Fri, Aug 28, 2020 at 01:47:59PM +0100, Marc Zyngier wrote:

> > So the arch_setup_msi_irq/etc is not really an arch hook, but some
> > infrastructure to support those 4 PCI root port drivers.
> 
> I happen to have a *really old* patch addressing Tegra [1], which
> I was never able to test (no HW). Rebasing it shouldn't be too hard,
> and maybe you can find someone internally willing to give it a spin?

Sure, that helps a bunch, I will ask internally if someone in that BU
can take a look.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
  2020-08-28 12:54           ` Jason Gunthorpe
@ 2020-08-28 13:52             ` Marc Zyngier
  0 siblings, 0 replies; 112+ messages in thread
From: Marc Zyngier @ 2020-08-28 13:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Lorenzo Pieralisi, Bjorn Helgaas, Thomas Gleixner, LKML, x86,
	Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang, Jon Derrick,
	Lu Baolu, Wei Liu, K. Y. Srinivasan, Stephen Hemminger,
	Steve Wahl, Dimitri Sivanich, Russ Anderson, linux-pci,
	Bjorn Helgaas, Konrad Rzeszutek Wilk, xen-devel, Juergen Gross,
	Boris Ostrovsky, Stefano Stabellini, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams, Rob Herring

On 2020-08-28 13:54, Jason Gunthorpe wrote:
> On Fri, Aug 28, 2020 at 01:47:59PM +0100, Marc Zyngier wrote:
> 
>> > So the arch_setup_msi_irq/etc is not really an arch hook, but some
>> > infrastructure to support those 4 PCI root port drivers.
>> 
>> I happen to have a *really old* patch addressing Tegra [1], which
>> I was never able to test (no HW). Rebasing it shouldn't be too hard,
>> and maybe you can find someone internally willing to give it a spin?
> 
> Sure, that helps a bunch, I will ask internally if someone in that BU
> can take a look.

I just rebased[1] the patch onto -rc2, and provided a TODO list in the
commit message.

Thanks,

         M.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/tegra-msi
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
  2020-08-27 18:20   ` Bjorn Helgaas
  2020-08-28 11:21     ` Lorenzo Pieralisi
@ 2020-08-28 18:29     ` Thomas Gleixner
  1 sibling, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-28 18:29 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams,
	Rob Herring

On Thu, Aug 27 2020 at 13:20, Bjorn Helgaas wrote:
> On Wed, Aug 26, 2020 at 01:17:02PM +0200, Thomas Gleixner wrote:
>> Make the architectures and drivers which rely on them select them in Kconfig
>> and if not selected replace them by stub functions which emit a warning and
>> fail the PCI/MSI interrupt allocation.
>
> Sorry, I really don't understand this, so these are probably stupid
> questions.
>
> If CONFIG_PCI_MSI_ARCH_FALLBACKS is defined, we will supply
> implementations of:
>
>   arch_setup_msi_irq
>   arch_teardown_msi_irq
>   arch_setup_msi_irqs
>   arch_teardown_msi_irqs
>   default_teardown_msi_irqs    # non-weak
>
> You select CONFIG_PCI_MSI_ARCH_FALLBACKS for ia64, mips, powerpc,
> s390, sparc, and x86.  I see that all of those arches implement at
> least one of the functions above.  But x86 doesn't and I can't figure
> out why it needs to select CONFIG_PCI_MSI_ARCH_FALLBACKS.

X86 still has them at that point in the series and the next patch
removes them. I wanted to have the warnings in place before doing so.

> I assume there's a way to convert these arches to hierarchical irq
> domains so they wouldn't need this at all?  Is there a sample
> conversion to look at?

For a quick and dirty step it's pretty much the wrapper I used for XEN
and then make sure that the msi_domain pointer is populated is
pci_device::device.

> And I can't figure out what's special about tegra, rcar, and xilinx
> that makes them need it as well.

Those are old drivers from the time where ARM did not use hierarchical
irq domains and nobody cared to fix them up.

> Is there something I could grep for
> to identify them?

git grep arch_setup_msi_irq
git grep arch_teardown_msi_irq

> Is there a way to convert them so they don't need it?

Sure, it just needs some work and probably hardware to test.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 43/46] genirq/msi: Provide and use msi_domain_set_default_info_flags()
  2020-08-27  8:17   ` Marc Zyngier
@ 2020-08-28 18:42     ` Thomas Gleixner
  0 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-28 18:42 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Greg Kroah-Hartman, Rafael J. Wysocki,
	Megha Dey, Jason Gunthorpe, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On Thu, Aug 27 2020 at 09:17, Marc Zyngier wrote:
> On 2020-08-26 12:17, Thomas Gleixner wrote:
>>  #ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>> +void msi_domain_set_default_info_flags(struct msi_domain_info *info)
>> +{
>> +	/* Required so that a device latches a valid MSI message on startup 
>> */
>> +	info->flags |= MSI_FLAG_ACTIVATE_EARLY;
>
> As far as I remember the story behind this flag (it's been a while),
> it was working around a PCI-specific issue, hence being located in
> the PCI code.

Yes. Some cards misbehave when there is no valid message programmed and
MSI is enabled.

> Now, the "program the MSI before enabling it" concept makes sense no
> matter what bus this is on, and I wonder why we are even keeping this
> flag around.

> Can't we just drop it together with the check in
> msi_domain_alloc_irqs()?

I'm fine with that.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (46 preceding siblings ...)
  2020-08-28 11:41 ` [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Joerg Roedel
@ 2020-08-31  0:51 ` Lu Baolu
  2020-08-31  7:10   ` Thomas Gleixner
  2020-09-01  9:06 ` Boqun Feng
                   ` (5 subsequent siblings)
  53 siblings, 1 reply; 112+ messages in thread
From: Lu Baolu @ 2020-08-31  0:51 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: baolu.lu, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Wei Liu, K. Y. Srinivasan, Stephen Hemminger,
	Steve Wahl, Dimitri Sivanich, Russ Anderson, linux-pci,
	Bjorn Helgaas, Lorenzo Pieralisi, Konrad Rzeszutek Wilk,
	xen-devel, Juergen Gross, Boris Ostrovsky, Stefano Stabellini,
	Marc Zyngier, Greg Kroah-Hartman, Rafael J. Wysocki, Megha Dey,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Jacob Pan,
	Baolu Lu, Kevin Tian, Dan Williams

Hi Thomas,

On 8/26/20 7:16 PM, Thomas Gleixner wrote:
> This is the second version of providing a base to support device MSI (non
> PCI based) and on top of that support for IMS (Interrupt Message Storm)
> based devices in a halfways architecture independent way.

After applying this patch series, the dmar_alloc_hwirq() helper doesn't
work anymore during boot. This causes the IOMMU driver to fail to
register the DMA fault handler and abort the IOMMU probe processing.
Is this a known issue?

Best regards,
baolu

> 
> The first version can be found here:
> 
>      https://lore.kernel.org/r/20200821002424.119492231@linutronix.de
> 
> It's still a mixed bag of bug fixes, cleanups and general improvements
> which are worthwhile independent of device MSI.
> 
> There are quite a bunch of issues to solve:
> 
>    - X86 does not use the device::msi_domain pointer for historical reasons
>      and due to XEN, which makes it impossible to create an architecture
>      agnostic device MSI infrastructure.
> 
>    - X86 has it's own msi_alloc_info data type which is pointlessly
>      different from the generic version and does not allow to share code.
> 
>    - The logic of composing MSI messages in an hierarchy is busted at the
>      core level and of course some (x86) drivers depend on that.
> 
>    - A few minor shortcomings as usual
> 
> This series addresses that in several steps:
> 
>   1) Accidental bug fixes
> 
>        iommu/amd: Prevent NULL pointer dereference
> 
>   2) Janitoring
> 
>        x86/init: Remove unused init ops
>        PCI: vmd: Dont abuse vector irqomain as parent
>        x86/msi: Remove pointless vcpu_affinity callback
> 
>   3) Sanitizing the composition of MSI messages in a hierarchy
>   
>        genirq/chip: Use the first chip in irq_chip_compose_msi_msg()
>        x86/msi: Move compose message callback where it belongs
> 
>   4) Simplification of the x86 specific interrupt allocation mechanism
> 
>        x86/irq: Rename X86_IRQ_ALLOC_TYPE_MSI* to reflect PCI dependency
>        x86/irq: Add allocation type for parent domain retrieval
>        iommu/vt-d: Consolidate irq domain getter
>        iommu/amd: Consolidate irq domain getter
>        iommu/irq_remapping: Consolidate irq domain lookup
> 
>   5) Consolidation of the X86 specific interrupt allocation mechanism to be as close
>      as possible to the generic MSI allocation mechanism which allows to get rid
>      of quite a bunch of x86'isms which are pointless
> 
>        x86/irq: Prepare consolidation of irq_alloc_info
>        x86/msi: Consolidate HPET allocation
>        x86/ioapic: Consolidate IOAPIC allocation
>        x86/irq: Consolidate DMAR irq allocation
>        x86/irq: Consolidate UV domain allocation
>        PCI/MSI: Rework pci_msi_domain_calc_hwirq()
>        x86/msi: Consolidate MSI allocation
>        x86/msi: Use generic MSI domain ops
> 
>    6) x86 specific cleanups to remove the dependency on arch_*_msi_irqs()
> 
>        x86/irq: Move apic_post_init() invocation to one place
>        x86/pci: Reducde #ifdeffery in PCI init code
>        x86/irq: Initialize PCI/MSI domain at PCI init time
>        irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI
>        PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI
>        PCI/MSI: Provide pci_dev_has_special_msi_domain() helper
>        x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init()
>        x86/xen: Rework MSI teardown
>        x86/xen: Consolidate XEN-MSI init
>        irqdomain/msi: Allow to override msi_domain_alloc/free_irqs()
>        x86/xen: Wrap XEN MSI management into irqdomain
>        iommm/vt-d: Store irq domain in struct device
>        iommm/amd: Store irq domain in struct device
>        x86/pci: Set default irq domain in pcibios_add_device()
>        PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
>        x86/irq: Cleanup the arch_*_msi_irqs() leftovers
>        x86/irq: Make most MSI ops XEN private
>        iommu/vt-d: Remove domain search for PCI/MSI[X]
>        iommu/amd: Remove domain search for PCI/MSI
> 
>    7) X86 specific preparation for device MSI
> 
>        x86/irq: Add DEV_MSI allocation type
>        x86/msi: Rename and rework pci_msi_prepare() to cover non-PCI MSI
> 
>    8) Generic device MSI infrastructure
>        platform-msi: Provide default irq_chip:: Ack
>        genirq/proc: Take buslock on affinity write
>        genirq/msi: Provide and use msi_domain_set_default_info_flags()
>        platform-msi: Add device MSI infrastructure
>        irqdomain/msi: Provide msi_alloc/free_store() callbacks
> 
>    9) POC of IMS (Interrupt Message Storm) irq domain and irqchip
>       implementations for both device array and queue storage.
> 
>        irqchip: Add IMS (Interrupt Message Storm) driver - NOT FOR MERGING
> 
> Changes vs. V1:
> 
>     - Addressed various review comments and addressed the 0day fallout.
>       - Corrected the XEN logic (Jürgen)
>       - Make the arch fallback in PCI/MSI opt-in not opt-out (Bjorn)
> 
>     - Fixed the compose MSI message inconsistency
> 
>     - Ensure that the necessary flags are set for device SMI
> 
>     - Make the irq bus logic work for affinity setting to prepare
>       support for IMS storage in queue memory. It turned out to be
>       less scary than I feared.
> 
>     - Remove leftovers in iommu/intel|amd
> 
>     - Reworked the IMS POC driver to cover queue storage so Jason can have a
>       look whether that fits the needs of MLX devices.
> 
> The whole lot is also available from git:
> 
>     git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git device-msi
> 
> This has been tested on Intel/AMD/KVM but lacks testing on:
> 
>      - HYPERV (-ENODEV)
>      - VMD enabled systems (-ENODEV)
>      - XEN (-ENOCLUE)
>      - IMS (-ENODEV)
> 
>      - Any non-X86 code which might depend on the broken compose MSI message
>        logic. Marc excpects not much fallout, but agrees that we need to fix
>        it anyway.
> 
> #1 - #3 should be applied unconditionally for obvious reasons
> #4 - #6 are wortwhile cleanups which should be done independent of device MSI
> 
> #7 - #8 look promising to cleanup the platform MSI implementation
>       	independent of #8, but I neither had cycles nor the stomach to
>       	tackle that.
> 
> #9	is obviously just for the folks interested in IMS
> 
> Thanks,
> 
> 	tglx
> 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI
  2020-08-31  0:51 ` Lu Baolu
@ 2020-08-31  7:10   ` Thomas Gleixner
  2020-08-31  7:29     ` Lu Baolu
  0 siblings, 1 reply; 112+ messages in thread
From: Thomas Gleixner @ 2020-08-31  7:10 UTC (permalink / raw)
  To: Lu Baolu, LKML
  Cc: baolu.lu, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Wei Liu, K. Y. Srinivasan, Stephen Hemminger,
	Steve Wahl, Dimitri Sivanich, Russ Anderson, linux-pci,
	Bjorn Helgaas, Lorenzo Pieralisi, Konrad Rzeszutek Wilk,
	xen-devel, Juergen Gross, Boris Ostrovsky, Stefano Stabellini,
	Marc Zyngier, Greg Kroah-Hartman, Rafael J. Wysocki, Megha Dey,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Jacob Pan,
	Baolu Lu, Kevin Tian, Dan Williams

On Mon, Aug 31 2020 at 08:51, Lu Baolu wrote:
> On 8/26/20 7:16 PM, Thomas Gleixner wrote:
>> This is the second version of providing a base to support device MSI (non
>> PCI based) and on top of that support for IMS (Interrupt Message Storm)
>> based devices in a halfways architecture independent way.
>
> After applying this patch series, the dmar_alloc_hwirq() helper doesn't
> work anymore during boot. This causes the IOMMU driver to fail to
> register the DMA fault handler and abort the IOMMU probe processing.
> Is this a known issue?

See replies to patch 15/46 or pull the git tree. It has the issue fixed.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI
  2020-08-31  7:10   ` Thomas Gleixner
@ 2020-08-31  7:29     ` Lu Baolu
  0 siblings, 0 replies; 112+ messages in thread
From: Lu Baolu @ 2020-08-31  7:29 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: baolu.lu, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Wei Liu, K. Y. Srinivasan, Stephen Hemminger,
	Steve Wahl, Dimitri Sivanich, Russ Anderson, linux-pci,
	Bjorn Helgaas, Lorenzo Pieralisi, Konrad Rzeszutek Wilk,
	xen-devel, Juergen Gross, Boris Ostrovsky, Stefano Stabellini,
	Marc Zyngier, Greg Kroah-Hartman, Rafael J. Wysocki, Megha Dey,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Jacob Pan,
	Baolu Lu, Kevin Tian, Dan Williams

Hi Thomas,

On 2020/8/31 15:10, Thomas Gleixner wrote:
> On Mon, Aug 31 2020 at 08:51, Lu Baolu wrote:
>> On 8/26/20 7:16 PM, Thomas Gleixner wrote:
>>> This is the second version of providing a base to support device MSI (non
>>> PCI based) and on top of that support for IMS (Interrupt Message Storm)
>>> based devices in a halfways architecture independent way.
>>
>> After applying this patch series, the dmar_alloc_hwirq() helper doesn't
>> work anymore during boot. This causes the IOMMU driver to fail to
>> register the DMA fault handler and abort the IOMMU probe processing.
>> Is this a known issue?
> 
> See replies to patch 15/46 or pull the git tree. It has the issue fixed.

Ah! Yes. Sorry for the noise.

Beset regards,
baolu

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 24/46] PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI
  2020-08-26 11:16 ` [patch V2 24/46] PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI Thomas Gleixner
  2020-08-26 20:47   ` Marc Zyngier
@ 2020-08-31 14:39   ` Jason Gunthorpe
  2020-09-30 12:45     ` Derrick, Jonathan
  1 sibling, 1 reply; 112+ messages in thread
From: Jason Gunthorpe @ 2020-08-31 14:39 UTC (permalink / raw)
  To: Thomas Gleixner, Jon Derrick
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Lu Baolu, Wei Liu, K. Y. Srinivasan, Stephen Hemminger,
	Steve Wahl, Dimitri Sivanich, Russ Anderson, linux-pci,
	Bjorn Helgaas, Lorenzo Pieralisi, Konrad Rzeszutek Wilk,
	xen-devel, Juergen Gross, Boris Ostrovsky, Stefano Stabellini,
	Marc Zyngier, Greg Kroah-Hartman, Rafael J. Wysocki, Megha Dey,
	Dave Jiang, Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian,
	Dan Williams

On Wed, Aug 26, 2020 at 01:16:52PM +0200, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Devices on the VMD bus use their own MSI irq domain, but it is not
> distinguishable from regular PCI/MSI irq domains. This is required
> to exclude VMD devices from getting the irq domain pointer set by
> interrupt remapping.
> 
> Override the default bus token.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>  drivers/pci/controller/vmd.c |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> +++ b/drivers/pci/controller/vmd.c
> @@ -579,6 +579,12 @@ static int vmd_enable_domain(struct vmd_
>  		return -ENODEV;
>  	}
>  
> +	/*
> +	 * Override the irq domain bus token so the domain can be distinguished
> +	 * from a regular PCI/MSI domain.
> +	 */
> +	irq_domain_update_bus_token(vmd->irq_domain, DOMAIN_BUS_VMD_MSI);
> +

Having the non-transparent-bridge hold a MSI table and
multiplex/de-multiplex IRQs looks like another good use case for
something close to pci_subdevice_msi_create_irq_domain()?

If each device could have its own internal MSI-X table programmed
properly things would work alot better. Disable capture/remap of the
MSI range in the NTB.

Then each device could have a proper non-multiplexed interrupt
delivered to the CPU.. Affinity would work properly, no need to share
IRQs (eg vmd_irq() goes away)/etc.

Something for the VMD maintainers to think about at least.

As I hear more about NTB these days a full MSI scheme for NTB seems
interesting to have in the PCI-E core code..

Jason

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 46/46] irqchip: Add IMS (Interrupt Message Storm) driver - NOT FOR MERGING
  2020-08-26 11:17 ` [patch V2 46/46] irqchip: Add IMS (Interrupt Message Storm) driver - NOT FOR MERGING Thomas Gleixner
@ 2020-08-31 14:45   ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2020-08-31 14:45 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On Wed, Aug 26, 2020 at 01:17:14PM +0200, Thomas Gleixner wrote:
> + * ims_queue_info - Information to create an IMS queue domain
> + * @queue_lock:		Callback which informs the device driver that
> + *			an interrupt management operation starts.
> + * @queue_sync_unlock:	Callback which informs the device driver that an
> + *			interrupt management operation ends.
> +
> + * @queue_get_shadow:   Callback to retrieve te shadow storage for a MSI
> + *			entry associated to a queue. The queue is
> + *			identified by the device struct which is used for
> + *			allocating interrupts and the msi entry index.
> + *
> + * @queue_lock() and @queue_sync_unlock() are only called for management
> + * operations on a particular interrupt: request, free, enable, disable,
> + * affinity setting.  These functions are never called from atomic context,
> + * like low level interrupt handling code. The purpose of these functions
> + * is to signal the device driver the start and end of an operation which
> + * affects the IMS queue shadow state. @queue_lock() allows the driver to
> + * do preperatory work, e.g. locking. Note, that @queue_lock() has to
> + * preserve the sleepable state on return. That means the driver cannot
> + * disable preemption and (soft)interrupts in @queue_lock and then undo
> + * that operation in @queue_sync_unlock() which restricts the lock types
> + * for eventual serialization of these operations to sleepable locks. Of
> + * course the driver can disable preemption and (soft)interrupts
> + * temporarily for internal work.
> + *
> + * On @queue_sync_unlock() the driver has to check whether the shadow state
> + * changed and issue a command to update the hardware state and wait for
> + * the command to complete. If the command fails or times out then the
> + * driver has to take care of the resulting mess as this is called from
> + * functions which have no return value and none of the callers can deal
> + * with the failure. The lock which is used by the driver to protect a
> + * operation sequence must obviously not be released before the command
> + * completes or fails. Otherwise new operations on the same interrupt line
> + * could take place and change the shadow state before the driver was able
> + * to compose the command.

I haven't looked through everything in detail, but this does look like
it is good for the mlx5 devices. Looked like it was only one small
update to the set_affinity, so not very disruptive?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (47 preceding siblings ...)
  2020-08-31  0:51 ` Lu Baolu
@ 2020-09-01  9:06 ` Boqun Feng
  2020-09-03 16:35 ` Raj, Ashok
                   ` (4 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Boqun Feng @ 2020-09-01  9:06 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams,
	Michael Kelley

Hi Thomas,

On Wed, Aug 26, 2020 at 01:16:28PM +0200, Thomas Gleixner wrote:
[...]
> 
> The whole lot is also available from git:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git device-msi
> 
> This has been tested on Intel/AMD/KVM but lacks testing on:
> 
>     - HYPERV (-ENODEV)

FWIW, I did a build and boot test in a hyperv guest with your
development branch, the latest commit is 71cbf478eb6f ("irqchip: Add
IMS (Interrupt Message Storm) driver - NOT FOR MERGING"). And everything
seemed working fine.

If you want me to set/unset a particular CONFIG option or run some
command for testing purposes, please let me know ;-)

Regards,
Bqoun

>     - VMD enabled systems (-ENODEV)
>     - XEN (-ENOCLUE)
>     - IMS (-ENODEV)
> 
>     - Any non-X86 code which might depend on the broken compose MSI message
>       logic. Marc excpects not much fallout, but agrees that we need to fix
>       it anyway.
> 
> #1 - #3 should be applied unconditionally for obvious reasons
> #4 - #6 are wortwhile cleanups which should be done independent of device MSI
> 
> #7 - #8 look promising to cleanup the platform MSI implementation
>      	independent of #8, but I neither had cycles nor the stomach to
>      	tackle that.
> 
> #9	is obviously just for the folks interested in IMS
> 
> Thanks,
> 
> 	tglx

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (48 preceding siblings ...)
  2020-09-01  9:06 ` Boqun Feng
@ 2020-09-03 16:35 ` Raj, Ashok
  2020-09-03 18:12   ` Thomas Gleixner
  2020-09-08  3:39 ` Russ Anderson
                   ` (3 subsequent siblings)
  53 siblings, 1 reply; 112+ messages in thread
From: Raj, Ashok @ 2020-09-03 16:35 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams,
	Ashok Raj

Hi Thomas,

Thanks a ton for jumping in helping on straightening it for IMS!!!


On Wed, Aug 26, 2020 at 01:16:28PM +0200, Thomas Gleixner wrote:
> This is the second version of providing a base to support device MSI (non
> PCI based) and on top of that support for IMS (Interrupt Message Storm)

s/Storm/Store

maybe pun intended :-)

> based devices in a halfways architecture independent way.

You mean "halfways" because the message addr and data follow guidelines
per arch (x86 or such), but the location of the storage isn't dictated
by architecture? or did you have something else in mind? 

> 
> The first version can be found here:
> 
>     https://lore.kernel.org/r/20200821002424.119492231@linutronix.de
> 

[snip]

> 
> Changes vs. V1:
> 
>    - Addressed various review comments and addressed the 0day fallout.
>      - Corrected the XEN logic (Jürgen)
>      - Make the arch fallback in PCI/MSI opt-in not opt-out (Bjorn)
> 
>    - Fixed the compose MSI message inconsistency
> 
>    - Ensure that the necessary flags are set for device SMI

is that supposed to be MSI? 

Cheers,
Ashok

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI
  2020-09-03 16:35 ` Raj, Ashok
@ 2020-09-03 18:12   ` Thomas Gleixner
  0 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-09-03 18:12 UTC (permalink / raw)
  To: Raj, Ashok
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams,
	Ashok Raj

Ashok,

On Thu, Sep 03 2020 at 09:35, Ashok Raj wrote:
> On Wed, Aug 26, 2020 at 01:16:28PM +0200, Thomas Gleixner wrote:
>> This is the second version of providing a base to support device MSI (non
>> PCI based) and on top of that support for IMS (Interrupt Message Storm)
>
> s/Storm/Store
>
> maybe pun intended :-)

Maybe? :)

>> based devices in a halfways architecture independent way.
>
> You mean "halfways" because the message addr and data follow guidelines
> per arch (x86 or such), but the location of the storage isn't dictated
> by architecture? or did you have something else in mind?

Yes, the actual message adress and data format are architecture
specific, but we also have x86 specific allocation info format which
needs an arch callback unfortunately.

>>    - Ensure that the necessary flags are set for device SMI
>
> is that supposed to be MSI? 

Of course, but SMI is a better match for Message Storm :)

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (49 preceding siblings ...)
  2020-09-03 16:35 ` Raj, Ashok
@ 2020-09-08  3:39 ` Russ Anderson
  2020-09-25 15:29 ` Qian Cai
                   ` (2 subsequent siblings)
  53 siblings, 0 replies; 112+ messages in thread
From: Russ Anderson @ 2020-09-08  3:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, linux-pci,
	Bjorn Helgaas, Lorenzo Pieralisi, Konrad Rzeszutek Wilk,
	xen-devel, Juergen Gross, Boris Ostrovsky, Stefano Stabellini,
	Marc Zyngier, Greg Kroah-Hartman, Rafael J. Wysocki, Megha Dey,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Jacob Pan,
	Baolu Lu, Kevin Tian, Dan Williams

On Wed, Aug 26, 2020 at 01:16:28PM +0200, Thomas Gleixner wrote:
> This is the second version of providing a base to support device MSI (non
> PCI based) and on top of that support for IMS (Interrupt Message Storm)
> based devices in a halfways architecture independent way.

Booted with quick testing on a 32 socket, 1536 CPU, 12 TB memory
Cascade Lake system and a 8 socket, 144 CPU, 3 TB memory
Cooper Lake system without any obvious regression.


-- 
Russ Anderson,  SuperDome Flex Linux Kernel Group Manager
HPE - Hewlett Packard Enterprise (formerly SGI)  rja@hpe.com

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 14/46] x86/ioapic: Consolidate IOAPIC allocation
  2020-08-26 11:16 ` [patch V2 14/46] x86/ioapic: Consolidate IOAPIC allocation Thomas Gleixner
@ 2020-09-08 13:35   ` Wei Liu
  0 siblings, 0 replies; 112+ messages in thread
From: Wei Liu @ 2020-09-08 13:35 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On Wed, Aug 26, 2020 at 01:16:42PM +0200, Thomas Gleixner wrote:
...
> --- a/drivers/iommu/hyperv-iommu.c
> +++ b/drivers/iommu/hyperv-iommu.c
> @@ -101,7 +101,7 @@ static int hyperv_irq_remapping_alloc(st
>  	 * in the chip_data and hyperv_irq_remapping_activate()/hyperv_ir_set_
>  	 * affinity() set vector and dest_apicid directly into IO-APIC entry.
>  	 */
> -	irq_data->chip_data = info->ioapic_entry;
> +	irq_data->chip_data = info->ioapic.entry;

Not sure if it is required for such a trivial change but here you go:

Acked-by: Wei Liu <wei.liu@kernel.org>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 18/46] x86/msi: Consolidate MSI allocation
  2020-08-26 11:16 ` [patch V2 18/46] x86/msi: Consolidate MSI allocation Thomas Gleixner
@ 2020-09-08 13:36   ` Wei Liu
  0 siblings, 0 replies; 112+ messages in thread
From: Wei Liu @ 2020-09-08 13:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On Wed, Aug 26, 2020 at 01:16:46PM +0200, Thomas Gleixner wrote:
[...]
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -1534,7 +1534,7 @@ static struct irq_chip hv_msi_irq_chip =
>  static irq_hw_number_t hv_msi_domain_ops_get_hwirq(struct msi_domain_info *info,
>  						   msi_alloc_info_t *arg)
>  {
> -	return arg->msi_hwirq;
> +	return arg->hwirq;
>  }

Acked-by: Wei Liu <wei.liu@kernel.org>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
  2020-08-26 11:17 ` [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable Thomas Gleixner
                     ` (2 preceding siblings ...)
  2020-08-27 18:20   ` Bjorn Helgaas
@ 2020-09-25 13:54   ` Qian Cai
  2020-09-26 12:38     ` Vasily Gorbik
  3 siblings, 1 reply; 112+ messages in thread
From: Qian Cai @ 2020-09-25 13:54 UTC (permalink / raw)
  To: Thomas Gleixner, LKML, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, linux-s390, Stephen Rothwell, linux-next
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On Wed, 2020-08-26 at 13:17 +0200, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The arch_.*_msi_irq[s] fallbacks are compiled in whether an architecture
> requires them or not. Architectures which are fully utilizing hierarchical
> irq domains should never call into that code.
> 
> It's not only architectures which depend on that by implementing one or
> more of the weak functions, there is also a bunch of drivers which relies
> on the weak functions which invoke msi_controller::setup_irq[s] and
> msi_controller::teardown_irq.
> 
> Make the architectures and drivers which rely on them select them in Kconfig
> and if not selected replace them by stub functions which emit a warning and
> fail the PCI/MSI interrupt allocation.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Today's linux-next will have some warnings on s390x:

.config: https://gitlab.com/cailca/linux-mm/-/blob/master/s390.config

WARNING: unmet direct dependencies detected for PCI_MSI_ARCH_FALLBACKS
  Depends on [n]: PCI [=n]
  Selected by [y]:
  - S390 [=y]

WARNING: unmet direct dependencies detected for PCI_MSI_ARCH_FALLBACKS
  Depends on [n]: PCI [=n]
  Selected by [y]:
  - S390 [=y]

> ---
> V2: Make the architectures (and drivers) which need the fallbacks select them
>     and not the other way round (Bjorn).
> ---
>  arch/ia64/Kconfig              |    1 +
>  arch/mips/Kconfig              |    1 +
>  arch/powerpc/Kconfig           |    1 +
>  arch/s390/Kconfig              |    1 +
>  arch/sparc/Kconfig             |    1 +
>  arch/x86/Kconfig               |    1 +
>  drivers/pci/Kconfig            |    3 +++
>  drivers/pci/controller/Kconfig |    3 +++
>  drivers/pci/msi.c              |    3 ++-
>  include/linux/msi.h            |   31 ++++++++++++++++++++++++++-----
>  10 files changed, 40 insertions(+), 6 deletions(-)
> 
> --- a/arch/ia64/Kconfig
> +++ b/arch/ia64/Kconfig
> @@ -56,6 +56,7 @@ config IA64
>  	select NEED_DMA_MAP_STATE
>  	select NEED_SG_DMA_LENGTH
>  	select NUMA if !FLATMEM
> +	select PCI_MSI_ARCH_FALLBACKS
>  	default y
>  	help
>  	  The Itanium Processor Family is Intel's 64-bit successor to
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -86,6 +86,7 @@ config MIPS
>  	select MODULES_USE_ELF_REL if MODULES
>  	select MODULES_USE_ELF_RELA if MODULES && 64BIT
>  	select PERF_USE_VMALLOC
> +	select PCI_MSI_ARCH_FALLBACKS
>  	select RTC_LIB
>  	select SYSCTL_EXCEPTION_TRACE
>  	select VIRT_TO_BUS
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -246,6 +246,7 @@ config PPC
>  	select OLD_SIGACTION			if PPC32
>  	select OLD_SIGSUSPEND
>  	select PCI_DOMAINS			if PCI
> +	select PCI_MSI_ARCH_FALLBACKS
>  	select PCI_SYSCALL			if PCI
>  	select PPC_DAWR				if PPC64
>  	select RTC_LIB
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -185,6 +185,7 @@ config S390
>  	select OLD_SIGSUSPEND3
>  	select PCI_DOMAINS		if PCI
>  	select PCI_MSI			if PCI
> +	select PCI_MSI_ARCH_FALLBACKS
>  	select SPARSE_IRQ
>  	select SYSCTL_EXCEPTION_TRACE
>  	select THREAD_INFO_IN_TASK
> --- a/arch/sparc/Kconfig
> +++ b/arch/sparc/Kconfig
> @@ -43,6 +43,7 @@ config SPARC
>  	select GENERIC_STRNLEN_USER
>  	select MODULES_USE_ELF_RELA
>  	select PCI_SYSCALL if PCI
> +	select PCI_MSI_ARCH_FALLBACKS
>  	select ODD_RT_SIGACTION
>  	select OLD_SIGSUSPEND
>  	select CPU_NO_EFFICIENT_FFS
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -225,6 +225,7 @@ config X86
>  	select NEED_SG_DMA_LENGTH
>  	select PCI_DOMAINS			if PCI
>  	select PCI_LOCKLESS_CONFIG		if PCI
> +	select PCI_MSI_ARCH_FALLBACKS
>  	select PERF_EVENTS
>  	select RTC_LIB
>  	select RTC_MC146818_LIB
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -56,6 +56,9 @@ config PCI_MSI_IRQ_DOMAIN
>  	depends on PCI_MSI
>  	select GENERIC_MSI_IRQ_DOMAIN
>  
> +config PCI_MSI_ARCH_FALLBACKS
> +	bool
> +
>  config PCI_QUIRKS
>  	default y
>  	bool "Enable PCI quirk workarounds" if EXPERT
> --- a/drivers/pci/controller/Kconfig
> +++ b/drivers/pci/controller/Kconfig
> @@ -41,6 +41,7 @@ config PCI_TEGRA
>  	bool "NVIDIA Tegra PCIe controller"
>  	depends on ARCH_TEGRA || COMPILE_TEST
>  	depends on PCI_MSI_IRQ_DOMAIN
> +	select PCI_MSI_ARCH_FALLBACKS
>  	help
>  	  Say Y here if you want support for the PCIe host controller found
>  	  on NVIDIA Tegra SoCs.
> @@ -67,6 +68,7 @@ config PCIE_RCAR_HOST
>  	bool "Renesas R-Car PCIe host controller"
>  	depends on ARCH_RENESAS || COMPILE_TEST
>  	depends on PCI_MSI_IRQ_DOMAIN
> +	select PCI_MSI_ARCH_FALLBACKS
>  	help
>  	  Say Y here if you want PCIe controller support on R-Car SoCs in host
>  	  mode.
> @@ -103,6 +105,7 @@ config PCIE_XILINX_CPM
>  	bool "Xilinx Versal CPM host bridge support"
>  	depends on ARCH_ZYNQMP || COMPILE_TEST
>  	select PCI_HOST_COMMON
> +	select PCI_MSI_ARCH_FALLBACKS
>  	help
>  	  Say 'Y' here if you want kernel support for the
>  	  Xilinx Versal CPM host bridge.
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -58,8 +58,8 @@ static void pci_msi_teardown_msi_irqs(st
>  #define pci_msi_teardown_msi_irqs	arch_teardown_msi_irqs
>  #endif
>  
> +#ifdef CONFIG_PCI_MSI_ARCH_FALLBACKS
>  /* Arch hooks */
> -
>  int __weak arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc)
>  {
>  	struct msi_controller *chip = dev->bus->msi;
> @@ -132,6 +132,7 @@ void __weak arch_teardown_msi_irqs(struc
>  {
>  	return default_teardown_msi_irqs(dev);
>  }
> +#endif /* CONFIG_PCI_MSI_ARCH_FALLBACKS */
>  
>  static void default_restore_msi_irq(struct pci_dev *dev, int irq)
>  {
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -193,17 +193,38 @@ void pci_msi_mask_irq(struct irq_data *d
>  void pci_msi_unmask_irq(struct irq_data *data);
>  
>  /*
> - * The arch hooks to setup up msi irqs. Those functions are
> - * implemented as weak symbols so that they /can/ be overriden by
> - * architecture specific code if needed.
> + * The arch hooks to setup up msi irqs. Default functions are implemented
> + * as weak symbols so that they /can/ be overriden by architecture specific
> + * code if needed. These hooks must be enabled by the architecture or by
> + * drivers which depend on them via msi_controller based MSI handling.
> + *
> + * If CONFIG_PCI_MSI_ARCH_FALLBACKS is not selected they are replaced by
> + * stubs with warnings.
>   */
> +#ifdef CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS
>  int arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc);
>  void arch_teardown_msi_irq(unsigned int irq);
>  int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
>  void arch_teardown_msi_irqs(struct pci_dev *dev);
> -void arch_restore_msi_irqs(struct pci_dev *dev);
> -
>  void default_teardown_msi_irqs(struct pci_dev *dev);
> +#else
> +static inline int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int
> type)
> +{
> +	WARN_ON_ONCE(1);
> +	return -ENODEV;
> +}
> +
> +static inline void arch_teardown_msi_irqs(struct pci_dev *dev)
> +{
> +	WARN_ON_ONCE(1);
> +}
> +#endif
> +
> +/*
> + * The restore hooks are still available as they are useful even
> + * for fully irq domain based setups. Courtesy to XEN/X86.
> + */
> +void arch_restore_msi_irqs(struct pci_dev *dev);
>  void default_restore_msi_irqs(struct pci_dev *dev);
>  
>  struct msi_controller {
> 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (50 preceding siblings ...)
  2020-09-08  3:39 ` Russ Anderson
@ 2020-09-25 15:29 ` Qian Cai
  2020-09-25 15:49   ` Peter Zijlstra
  2020-09-29 23:03 ` [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Dey, Megha
  2020-11-12 12:55 ` REGRESSION: " Jason Gunthorpe
  53 siblings, 1 reply; 112+ messages in thread
From: Qian Cai @ 2020-09-25 15:29 UTC (permalink / raw)
  To: Thomas Gleixner, LKML, Stephen Rothwell, linux-next
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Jason Gunthorpe, Dave Jiang,
	Alex Williamson, Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On Wed, 2020-08-26 at 13:16 +0200, Thomas Gleixner wrote:
> This is the second version of providing a base to support device MSI (non
> PCI based) and on top of that support for IMS (Interrupt Message Storm)
> based devices in a halfways architecture independent way.
> 
> The first version can be found here:
> 
>     https://lore.kernel.org/r/20200821002424.119492231@linutronix.de
> 
> It's still a mixed bag of bug fixes, cleanups and general improvements
> which are worthwhile independent of device MSI.

Reverting the part of this patchset on the top of today's linux-next fixed an
boot issue on HPE ProLiant DL560 Gen10, i.e.,

$ git revert --no-edit 13b90cadfc29..bc95fd0d7c42

.config: https://gitlab.com/cailca/linux-mm/-/blob/master/x86.config

It looks like the crashes happen in the interrupt remapping code where they are
only able to to generate partial call traces.

[    1.912386][    T0] ACPI: X2APIC_NMI (uid[0xf5] high level 9983][    T0] ... MAX_LOCK_DEPTH:          48
[    7.914876][    T0] ... MAX_LOCKDEP_KEYS:        8192
[    7.919942][    T0] ... CLASSHASH_SIZE:          4096
[    7.925009][    T0] ... MAX_LOCKDEP_ENTRIES:     32768
[    7.930163][    T0] ... MAX_LOCKDEP_CHAINS:      65536
[    7.935318][    T0] ... CHAINHASH_SIZE:          32768
[    7.940473][    T0]  memory used by lock dependency info: 6301 kB
[    7.946586][    T0]  memory used for stack traces: 4224 kB
[    7.952088][    T0]  per task-struct memory footprint: 1920 bytes
[    7.968312][    T0] mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
[    7.980281][    T0] ACPI: Core revision 20200717
[    7.993343][    T0] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 79635855245 ns
[    8.003270][    T0] APIC: Switch to symmetric I/O mode setup
[    8.008951][    T0] DMAR: Host address width 46
[    8.013512][    T0] DMAR: DRHD base: 0x000000e5ffc000 flags: 0x0
[    8.019680][    T0] DMAR: dmar0: reg_base_addr e5ffc000 ver 1:0 cap 8d2078c106f0466 [    T0] DMAR-IR: IOAPIC id 15 under DRHD base  0xe5ffc000 IOMMU 0
[    8.420990][    T0] DMAR-IR: IOAPIC id 8 under DRHD base  0xddffc000 IOMMU 15
[    8.428166][    T0] DMAR-IR: IOAPIC id 9 under DRHD base  0xddffc000 IOMMU 15
[    8.435341][    T0] DMAR-IR: HPET id 0 under DRHD base 0xddffc000
[    8.441456][    T0] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    8.457911][    T0] DMAR-IR: Enabled IRQ remapping in x2apic mode
[    8.466614][    T0] BUG: kernel NULL pointer dereference, address: 0000000000000000
[    8.474295][    T0] #PF: supervisor instruction fetch in kernel mode
[    8.480669][    T0] #PF: error_code(0x0010) - not-present page
[    8.486518][    T0] PGD 0 P4D 0 
[    8.489757][    T0] Oops: 0010 [#1] SMP KASAN PTI
[    8.494476][    T0] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G          I       5.9.0-rc6-next-20200925 #2
[    8.503987][    T0] Hardware name: HPE ProLiant DL560 Gen10/ProLiant DL560 Gen10, BIOS U34 11/13/2019
[    8.513238][    T0] RIP: 0010:0x0
[    8.516562][    T0] Code: Bad RIP v

or

[    2.906744][    T0] ACPI: X2API32, address 0xfec68000, GSI 128-135
[    2.907063][    T0] IOAPIC[15]: apic_id 29, version 32, address 0xfec70000, GSI 136-143
[    2.907071][    T0] IOAPIC[16]: apic_id 30, version 32, address 0xfec78000, GSI 144-151
[    2.907079][    T0] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    2.907084][    T0] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    2.907100][    T0] Using ACPI (MADT) for SMP configuration information
[    2.907105][    T0] ACPI: HPET id: 0x8086a701 base: 0xfed00000
[    2.907116][    T0] ACPI: SPCR: console: uart,mmio,0x0,115200
[    2.907121][    T0] TSC deadline timer available
[    2.907126][    T0] smpboot: Allowing 144 CPUs, 0 hotplug CPUs
[    2.907163][    T0] [mem 0xd0000000-0xfdffffff] available for PCI devices
[    2.907175][    T0] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    2.914541][    T0] setup_percpu: NR_CPUS:256 nr_cpumask_bits:144 nr_cpu_ids:144 nr_node_ids:4
[    2.926109][   466 ecap f020df
[    9.134709][    T0] DMAR: DRHD base: 0x000000f5ffc000 flags: 0x0
[    9.140867][    T0] DMAR: dmar8: reg_base_addr f5ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    9.149610][    T0] DMAR: DRHD base: 0x000000f7ffc000 flags: 0x0
[    9.155762][    T0] DMAR: dmar9: reg_base_addr f7ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    9.164491][    T0] DMAR: DRHD base: 0x000000f9ffc000 flags: 0x0
[    9.170645][    T0] DMAR: dmar10: reg_base_addr f9ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    9.179476][    T0] DMAR: DRHD base: 0x000000fbffc000 flags: 0x0
[    9.185626][    T0] DMAR: dmar11: reg_base_addr fbffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    9.194442][    T0] DMAR: DRHD base: 0x000000dfffc000 flags: 0x0
[    9.200587][    T0] DMAR: dmar12: reg_base_addr dfffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    9.209418][    T0] DMAR: DRHD base: 0x000000e1ffc000 flags: 0x0
[    9.215551][    T0] DMAR: dmar13: reg_base_addr e1ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    9.224367][    T0] DMAR: DRHD base: 0x000000e3ffc83][    T0]  msi_domain_alloc+0x8e/0x280
[    9.615015][    T0]  __irq_domain_a8992cd
[    9.711906][    T0] R10: ffffffff85407d78 R11: fffffbfff18992cc R12: ffffffff8546ffc0
[    9.719761][    T0] R13: 0000000000000098 R14: ffff888106e63a40 R15: 0000000000000001
[    9.727617][    T0] FS:  0000000000000000(0000) GS:ffff8887df800000(0000) knlGS:0000000000000000
[    9.736431][    T0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    9.742892][    T0] CR2: ffffffffffffffd6 CR3: 0000001ba7814001 CR4: 00000000000606b0
[    9.750747][    T0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    9.758601][    T0] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    9.766456][    T0] Kernel panic - not syncing: Fatal exception
[    9.772547][    T0] ---[ end Kernel panic - not syncing: Fatal exception ]---

The working boot (without those patches) looks like this:

[    1.913963][    T0] ACPI: X2APIC_NMI (uid[0xf4] high level lint[0x1])
[    1.913967][    T0] ACPI: X2APIC_NMI (uid[0xf5] high level lint[0x1])
[    1.913970][    T0] ACPI: X2APIC_NMI (uid[0xf6] high level lint[0x1])
[    1.913974][    T0] ACPI: X2APIC_NMI (uid[0xf7] high level lint[0x1])
[    1.914017][    T0] IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23
[    1.914032][    T0] IOAPIC[1]: apic_id 9, version 32, address 0xfec01000, GSI 24-31
[    1.914039][    T0] IOAPIC[2]: apic_id 10, version 32, address 0xfec08000, GSI 32-39
[    1.914047][    T0] IOAPIC[3]: apic_id 11, version 32, address 0xfec10000, GSI 40-47
[    1.914054][    T0] IOAPIC[4]: apic_id 12, version 32, address 0xfec18000, GSI 48-55
[    1.914062][    T0] IOAPIC[5]: apic_id 15, version 32, address 0xfec20000, GSI 56-63
[    1.[    7.994567][    T0] mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
[    8.006541][    T0] ACPI: Core revision 20200717
[    8.019713][    T0] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 79635855245 ns
[    8.029672][    T0] APIC: Switch to symmetric I/O mode setup
[    8.035354][    T0] DMAR: Host address width 46
[    8.039915][    T0] DMAR: DRHD base: 0x000000e5ffc000 flags: 0x0
[    8.046095][    T0] DMAR: dmar0: reg_base_addr e5ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    8.054840][    T0] DMAR: DRHD base: 0x000000e7ffc000 flags: 0x0
[    8.060997][    T0] DMAR: dmar1: reg_base_addr e7ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    8.069740][    T0] DMAR: DRHD base: 0x000000e9ffc000 flags: 0x0
[    8.075872][    T0] DMAR: dmar2: reg_base_addr e9ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    8.084615][    T0] DMAR: DRHD base: 0x000000ebffc000 flags: 0x0
[    8.090761][    T0] DMAR: dmar3: reg_base_addr ebffc000 ver 1:0 cap 8d2078c106f0466 ecap fMAR-IR: Enabled IRQ remapping in x2apic mode
[    8.513491][    T0] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    8.568289][    T0] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x2b3e459bf4c, max_idle_ns: 440795289890 ns
[    8.579576][    T0] Calibrating delay loop (skipped), value calculated using timer frequency.. 6000.00 BogoMIPS (lpj=30000000)
[    8.589574][    T0] pid_max: default: 147456 minimum: 1152
[    8.714025][    T0] efi: memattr: Entry attributes invalid: RO and XP bits both cleared
[    8.719577][    T0] efi: memattr: ! 0x0000a057a000-0x0000a05b4fff [Runtime Code       |RUN|  |  |  |  |  |  |  |   |  |  |  |  ]
[    8.775355][    T0] Dentry cache hash table entries: 8388608 (order: 14, 67108864 bytes, vmalloc)
[    8.798868][    T0] Inode-cache hash table entries: 4194304 (order: 13, 33554432 bytes, vmalloc)
[    8.811550][    T0] Mount-cache hash table entries: 131072 (order: 8, 1048576 bytes, vmalloc)
[    8.820076][    T0] Mountpoint-cache hash table entries: 131072 (order: 8, 1048576 bytes, vmalloc)
[    8.879327][    T0] mce: CPU0: Thermal mo[    8.996916][    T1] Performance Events: PEBS fmt3+, Skylake events, 32-deep LBR, full-width counters, Intel PMU driver.
[    8.999591][    T1] ... version:                4
[    9.004310][    T1] ... bit width:              48
[    9.009118][    T1] ... generic registers:      4
[    9.009574][    T1] ... value mask:             0000ffffffffffff
[    9.015601][    T1] ... max period:             00007fffffffffff
[    9.019574][    T1] ... fixed-purpose events:   3
[    9.024294][    T1] ... event mask:             000000070000000f
[    9.034357][    T1] rcu: Hierarchical SRCU implementation.
[    9.062516][    T5] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.

> 
> There are quite a bunch of issues to solve:
> 
>   - X86 does not use the device::msi_domain pointer for historical reasons
>     and due to XEN, which makes it impossible to create an architecture
>     agnostic device MSI infrastructure.
> 
>   - X86 has it's own msi_alloc_info data type which is pointlessly
>     different from the generic version and does not allow to share code.
> 
>   - The logic of composing MSI messages in an hierarchy is busted at the
>     core level and of course some (x86) drivers depend on that.
> 
>   - A few minor shortcomings as usual
> 
> This series addresses that in several steps:
> 
>  1) Accidental bug fixes
> 
>       iommu/amd: Prevent NULL pointer dereference
> 
>  2) Janitoring
> 
>       x86/init: Remove unused init ops
>       PCI: vmd: Dont abuse vector irqomain as parent
>       x86/msi: Remove pointless vcpu_affinity callback
> 
>  3) Sanitizing the composition of MSI messages in a hierarchy
>  
>       genirq/chip: Use the first chip in irq_chip_compose_msi_msg()
>       x86/msi: Move compose message callback where it belongs
> 
>  4) Simplification of the x86 specific interrupt allocation mechanism
> 
>       x86/irq: Rename X86_IRQ_ALLOC_TYPE_MSI* to reflect PCI dependency
>       x86/irq: Add allocation type for parent domain retrieval
>       iommu/vt-d: Consolidate irq domain getter
>       iommu/amd: Consolidate irq domain getter
>       iommu/irq_remapping: Consolidate irq domain lookup
> 
>  5) Consolidation of the X86 specific interrupt allocation mechanism to be as
> close
>     as possible to the generic MSI allocation mechanism which allows to get
> rid
>     of quite a bunch of x86'isms which are pointless
> 
>       x86/irq: Prepare consolidation of irq_alloc_info
>       x86/msi: Consolidate HPET allocation
>       x86/ioapic: Consolidate IOAPIC allocation
>       x86/irq: Consolidate DMAR irq allocation
>       x86/irq: Consolidate UV domain allocation
>       PCI/MSI: Rework pci_msi_domain_calc_hwirq()
>       x86/msi: Consolidate MSI allocation
>       x86/msi: Use generic MSI domain ops
> 
>   6) x86 specific cleanups to remove the dependency on arch_*_msi_irqs()
> 
>       x86/irq: Move apic_post_init() invocation to one place
>       x86/pci: Reducde #ifdeffery in PCI init code
>       x86/irq: Initialize PCI/MSI domain at PCI init time
>       irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI
>       PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI
>       PCI/MSI: Provide pci_dev_has_special_msi_domain() helper
>       x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init()
>       x86/xen: Rework MSI teardown
>       x86/xen: Consolidate XEN-MSI init
>       irqdomain/msi: Allow to override msi_domain_alloc/free_irqs()
>       x86/xen: Wrap XEN MSI management into irqdomain
>       iommm/vt-d: Store irq domain in struct device
>       iommm/amd: Store irq domain in struct device
>       x86/pci: Set default irq domain in pcibios_add_device()
>       PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
>       x86/irq: Cleanup the arch_*_msi_irqs() leftovers
>       x86/irq: Make most MSI ops XEN private
>       iommu/vt-d: Remove domain search for PCI/MSI[X]
>       iommu/amd: Remove domain search for PCI/MSI
> 
>   7) X86 specific preparation for device MSI
> 
>       x86/irq: Add DEV_MSI allocation type
>       x86/msi: Rename and rework pci_msi_prepare() to cover non-PCI MSI
> 
>   8) Generic device MSI infrastructure
>       platform-msi: Provide default irq_chip:: Ack
>       genirq/proc: Take buslock on affinity write
>       genirq/msi: Provide and use msi_domain_set_default_info_flags()
>       platform-msi: Add device MSI infrastructure
>       irqdomain/msi: Provide msi_alloc/free_store() callbacks
> 
>   9) POC of IMS (Interrupt Message Storm) irq domain and irqchip
>      implementations for both device array and queue storage.
> 
>       irqchip: Add IMS (Interrupt Message Storm) driver - NOT FOR MERGING
> 
> Changes vs. V1:
> 
>    - Addressed various review comments and addressed the 0day fallout.
>      - Corrected the XEN logic (Jürgen)
>      - Make the arch fallback in PCI/MSI opt-in not opt-out (Bjorn)
> 
>    - Fixed the compose MSI message inconsistency
> 
>    - Ensure that the necessary flags are set for device SMI
> 
>    - Make the irq bus logic work for affinity setting to prepare
>      support for IMS storage in queue memory. It turned out to be
>      less scary than I feared.
> 
>    - Remove leftovers in iommu/intel|amd
> 
>    - Reworked the IMS POC driver to cover queue storage so Jason can have a
>      look whether that fits the needs of MLX devices.
> 
> The whole lot is also available from git:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git device-msi
> 
> This has been tested on Intel/AMD/KVM but lacks testing on:
> 
>     - HYPERV (-ENODEV)
>     - VMD enabled systems (-ENODEV)
>     - XEN (-ENOCLUE)
>     - IMS (-ENODEV)
> 
>     - Any non-X86 code which might depend on the broken compose MSI message
>       logic. Marc excpects not much fallout, but agrees that we need to fix
>       it anyway.
> 
> #1 - #3 should be applied unconditionally for obvious reasons
> #4 - #6 are wortwhile cleanups which should be done independent of device MSI
> 
> #7 - #8 look promising to cleanup the platform MSI implementation
>      	independent of #8, but I neither had cycles nor the stomach to
>      	tackle that.
> 
> #9	is obviously just for the folks interested in IMS
> 
> Thanks,
> 
> 	tglx


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI
  2020-09-25 15:29 ` Qian Cai
@ 2020-09-25 15:49   ` Peter Zijlstra
  2020-09-25 23:14     ` Thomas Gleixner
  0 siblings, 1 reply; 112+ messages in thread
From: Peter Zijlstra @ 2020-09-25 15:49 UTC (permalink / raw)
  To: Qian Cai
  Cc: Thomas Gleixner, LKML, Stephen Rothwell, linux-next, x86,
	Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang, Jon Derrick,
	Lu Baolu, Wei Liu, K. Y. Srinivasan, Stephen Hemminger,
	Steve Wahl, Dimitri Sivanich, Russ Anderson, linux-pci,
	Bjorn Helgaas, Lorenzo Pieralisi, Konrad Rzeszutek Wilk,
	xen-devel, Juergen Gross, Boris Ostrovsky, Stefano Stabellini,
	Marc Zyngier, Greg Kroah-Hartman, Rafael J. Wysocki, Megha Dey,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Jacob Pan,
	Baolu Lu, Kevin Tian, Dan Williams

On Fri, Sep 25, 2020 at 11:29:13AM -0400, Qian Cai wrote:

> It looks like the crashes happen in the interrupt remapping code where they are
> only able to to generate partial call traces.

> [    8.466614][    T0] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [    8.474295][    T0] #PF: supervisor instruction fetch in kernel mode
> [    8.480669][    T0] #PF: error_code(0x0010) - not-present page
> [    8.486518][    T0] PGD 0 P4D 0 
> [    8.489757][    T0] Oops: 0010 [#1] SMP KASAN PTI
> [    8.494476][    T0] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G          I       5.9.0-rc6-next-20200925 #2
> [    8.503987][    T0] Hardware name: HPE ProLiant DL560 Gen10/ProLiant DL560 Gen10, BIOS U34 11/13/2019
> [    8.513238][    T0] RIP: 0010:0x0
> [    8.516562][    T0] Code: Bad RIP v

Here it looks like this:

[    1.830276] BUG: kernel NULL pointer dereference, address: 0000000000000000
[    1.838043] #PF: supervisor instruction fetch in kernel mode
[    1.844357] #PF: error_code(0x0010) - not-present page
[    1.850090] PGD 0 P4D 0
[    1.852915] Oops: 0010 [#1] SMP
[    1.856419] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc6-00700-g0248dedd12d4 #419
[    1.865447] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
[    1.876902] RIP: 0010:0x0
[    1.879824] Code: Bad RIP value.
[    1.883423] RSP: 0000:ffffffff82803da0 EFLAGS: 00010282
[    1.889251] RAX: 0000000000000000 RBX: ffffffff8282b980 RCX: ffffffff82803e40
[    1.897241] RDX: 0000000000000001 RSI: ffffffff82803e40 RDI: ffffffff8282b980
[    1.905201] RBP: ffff88842f331000 R08: 00000000ffffffff R09: 0000000000000001
[    1.913162] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000048
[    1.921123] R13: ffffffff82803e40 R14: ffffffff8282b9c0 R15: 0000000000000000
[    1.929085] FS:  0000000000000000(0000) GS:ffff88842f400000(0000) knlGS:0000000000000000
[    1.938113] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.944524] CR2: ffffffffffffffd6 CR3: 0000000002811001 CR4: 00000000000606b0
[    1.952484] Call Trace:
[    1.955214]  msi_domain_alloc+0x36/0x130
[    1.959594]  __irq_domain_alloc_irqs+0x165/0x380
[    1.964748]  dmar_alloc_hwirq+0x9a/0x120
[    1.969127]  dmar_set_interrupt.part.0+0x1c/0x60
[    1.974281]  enable_drhd_fault_handling+0x2c/0x6c
[    1.979532]  apic_intr_mode_init+0xfa/0x100
[    1.984191]  x86_late_time_init+0x20/0x30
[    1.988662]  start_kernel+0x723/0x7e6
[    1.992748]  secondary_startup_64_no_verify+0xa6/0xab
[    1.998386] Modules linked in:
[    2.001794] CR2: 0000000000000000
[    2.005510] ---[ end trace 837dc60d7c66efa2 ]---


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI
  2020-09-25 15:49   ` Peter Zijlstra
@ 2020-09-25 23:14     ` Thomas Gleixner
  2020-09-27  8:46       ` [PATCH] x86/apic/msi: Unbreak DMAR and HPET MSI Thomas Gleixner
  0 siblings, 1 reply; 112+ messages in thread
From: Thomas Gleixner @ 2020-09-25 23:14 UTC (permalink / raw)
  To: Peter Zijlstra, Qian Cai
  Cc: LKML, Stephen Rothwell, linux-next, x86, Joerg Roedel, iommu,
	linux-hyperv, Haiyang Zhang, Jon Derrick, Lu Baolu, Wei Liu,
	K. Y. Srinivasan, Stephen Hemminger, Steve Wahl,
	Dimitri Sivanich, Russ Anderson, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Konrad Rzeszutek Wilk, xen-devel,
	Juergen Gross, Boris Ostrovsky, Stefano Stabellini, Marc Zyngier,
	Greg Kroah-Hartman, Rafael J. Wysocki, Megha Dey,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Jacob Pan,
	Baolu Lu, Kevin Tian, Dan Williams

On Fri, Sep 25 2020 at 17:49, Peter Zijlstra wrote:
> Here it looks like this:
>
> [    1.830276] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [    1.838043] #PF: supervisor instruction fetch in kernel mode
> [    1.844357] #PF: error_code(0x0010) - not-present page
> [    1.850090] PGD 0 P4D 0
> [    1.852915] Oops: 0010 [#1] SMP
> [    1.856419] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc6-00700-g0248dedd12d4 #419
> [    1.865447] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
> [    1.876902] RIP: 0010:0x0
> [    1.879824] Code: Bad RIP value.
> [    1.883423] RSP: 0000:ffffffff82803da0 EFLAGS: 00010282
> [    1.889251] RAX: 0000000000000000 RBX: ffffffff8282b980 RCX: ffffffff82803e40
> [    1.897241] RDX: 0000000000000001 RSI: ffffffff82803e40 RDI: ffffffff8282b980
> [    1.905201] RBP: ffff88842f331000 R08: 00000000ffffffff R09: 0000000000000001
> [    1.913162] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000048
> [    1.921123] R13: ffffffff82803e40 R14: ffffffff8282b9c0 R15: 0000000000000000
> [    1.929085] FS:  0000000000000000(0000) GS:ffff88842f400000(0000) knlGS:0000000000000000
> [    1.938113] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    1.944524] CR2: ffffffffffffffd6 CR3: 0000000002811001 CR4: 00000000000606b0
> [    1.952484] Call Trace:
> [    1.955214]  msi_domain_alloc+0x36/0x130

Hrm. That looks like a not initialized mandatory callback. Confused.

Is this on -next and if so, does this happen on tip:x86/irq as well?

Can you provide yoru config please?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
  2020-09-25 13:54   ` Qian Cai
@ 2020-09-26 12:38     ` Vasily Gorbik
  2020-09-28 10:11       ` Thomas Gleixner
  0 siblings, 1 reply; 112+ messages in thread
From: Vasily Gorbik @ 2020-09-26 12:38 UTC (permalink / raw)
  To: Thomas Gleixner, Qian Cai
  Cc: LKML, Heiko Carstens, Christian Borntraeger, linux-s390,
	Stephen Rothwell, linux-next, x86, Joerg Roedel, iommu,
	linux-hyperv, Haiyang Zhang, Jon Derrick, Lu Baolu, Wei Liu,
	K. Y. Srinivasan, Stephen Hemminger, Steve Wahl,
	Dimitri Sivanich, Russ Anderson, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Konrad Rzeszutek Wilk, xen-devel,
	Juergen Gross, Boris Ostrovsky, Stefano Stabellini, Marc Zyngier,
	Greg Kroah-Hartman, Rafael J. Wysocki, Megha Dey,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Jacob Pan,
	Baolu Lu, Kevin Tian, Dan Williams

On Fri, Sep 25, 2020 at 09:54:52AM -0400, Qian Cai wrote:
> On Wed, 2020-08-26 at 13:17 +0200, Thomas Gleixner wrote:
> > From: Thomas Gleixner <tglx@linutronix.de>
> > 
> > The arch_.*_msi_irq[s] fallbacks are compiled in whether an architecture
> > requires them or not. Architectures which are fully utilizing hierarchical
> > irq domains should never call into that code.
> > 
> > It's not only architectures which depend on that by implementing one or
> > more of the weak functions, there is also a bunch of drivers which relies
> > on the weak functions which invoke msi_controller::setup_irq[s] and
> > msi_controller::teardown_irq.
> > 
> > Make the architectures and drivers which rely on them select them in Kconfig
> > and if not selected replace them by stub functions which emit a warning and
> > fail the PCI/MSI interrupt allocation.
> > 
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> 
> Today's linux-next will have some warnings on s390x:
> 
> .config: https://gitlab.com/cailca/linux-mm/-/blob/master/s390.config
> 
> WARNING: unmet direct dependencies detected for PCI_MSI_ARCH_FALLBACKS
>   Depends on [n]: PCI [=n]
>   Selected by [y]:
>   - S390 [=y]
> 
> WARNING: unmet direct dependencies detected for PCI_MSI_ARCH_FALLBACKS
>   Depends on [n]: PCI [=n]
>   Selected by [y]:
>   - S390 [=y]
>

Yes, as well as on mips and sparc which also don't FORCE_PCI.
This seems to work for s390:

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index b0b7acf07eb8..41136fbe909b 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -192,3 +192,3 @@ config S390
        select PCI_MSI                  if PCI
-       select PCI_MSI_ARCH_FALLBACKS
+       select PCI_MSI_ARCH_FALLBACKS   if PCI
        select SET_FS

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH] x86/apic/msi: Unbreak DMAR and HPET MSI
  2020-09-25 23:14     ` Thomas Gleixner
@ 2020-09-27  8:46       ` Thomas Gleixner
  0 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-09-27  8:46 UTC (permalink / raw)
  To: Peter Zijlstra, Qian Cai
  Cc: LKML, Stephen Rothwell, linux-next, x86, Joerg Roedel, iommu,
	linux-hyperv, Haiyang Zhang, Jon Derrick, Lu Baolu, Wei Liu,
	K. Y. Srinivasan, Stephen Hemminger, Steve Wahl,
	Dimitri Sivanich, Russ Anderson, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Konrad Rzeszutek Wilk, xen-devel,
	Juergen Gross, Boris Ostrovsky, Stefano Stabellini, Marc Zyngier,
	Greg Kroah-Hartman, Rafael J. Wysocki, Megha Dey,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Jacob Pan,
	Baolu Lu, Kevin Tian, Dan Williams

Switching the DMAR and HPET MSI code to use the generic MSI domain ops
missed to add the flag which tells the core code to update the domain
operations with the defaults. As a consequence the core code crashes
when an interrupt in one of those domains is allocated.

Add the missing flags.

Fixes: 9006c133a422 ("x86/msi: Use generic MSI domain ops")
Reported-by: Qian Cai <cai@redhat.com> 
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/apic/msi.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -309,6 +309,7 @@ static struct msi_domain_ops dmar_msi_do
 static struct msi_domain_info dmar_msi_domain_info = {
 	.ops		= &dmar_msi_domain_ops,
 	.chip		= &dmar_msi_controller,
+	.flags		= MSI_FLAG_USE_DEF_DOM_OPS,
 };
 
 static struct irq_domain *dmar_get_irq_domain(void)
@@ -408,6 +409,7 @@ static struct msi_domain_ops hpet_msi_do
 static struct msi_domain_info hpet_msi_domain_info = {
 	.ops		= &hpet_msi_domain_ops,
 	.chip		= &hpet_msi_controller,
+	.flags		= MSI_FLAG_USE_DEF_DOM_OPS,
 };
 
 struct irq_domain *hpet_create_irq_domain(int hpet_id)

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
  2020-09-26 12:38     ` Vasily Gorbik
@ 2020-09-28 10:11       ` Thomas Gleixner
  0 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-09-28 10:11 UTC (permalink / raw)
  To: Vasily Gorbik, Qian Cai
  Cc: LKML, Heiko Carstens, Christian Borntraeger, linux-s390,
	Stephen Rothwell, linux-next, x86, Joerg Roedel, iommu,
	linux-hyperv, Haiyang Zhang, Jon Derrick, Lu Baolu, Wei Liu,
	K. Y. Srinivasan, Stephen Hemminger, Steve Wahl,
	Dimitri Sivanich, Russ Anderson, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Konrad Rzeszutek Wilk, xen-devel,
	Juergen Gross, Boris Ostrovsky, Stefano Stabellini, Marc Zyngier,
	Greg Kroah-Hartman, Rafael J. Wysocki, Megha Dey,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Jacob Pan,
	Baolu Lu, Kevin Tian, Dan Williams

On Sat, Sep 26 2020 at 14:38, Vasily Gorbik wrote:
> On Fri, Sep 25, 2020 at 09:54:52AM -0400, Qian Cai wrote:
> Yes, as well as on mips and sparc which also don't FORCE_PCI.
> This seems to work for s390:
>
> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> index b0b7acf07eb8..41136fbe909b 100644
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -192,3 +192,3 @@ config S390
>         select PCI_MSI                  if PCI
> -       select PCI_MSI_ARCH_FALLBACKS
> +       select PCI_MSI_ARCH_FALLBACKS   if PCI
>         select SET_FS

lemme fix that for all of them ...

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (51 preceding siblings ...)
  2020-09-25 15:29 ` Qian Cai
@ 2020-09-29 23:03 ` Dey, Megha
  2020-09-30  6:41   ` Thomas Gleixner
  2020-11-12 12:55 ` REGRESSION: " Jason Gunthorpe
  53 siblings, 1 reply; 112+ messages in thread
From: Dey, Megha @ 2020-09-29 23:03 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jason Gunthorpe, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams, dave.jiang,
	ravi.v.shankar

Hi Thomas,

On 8/26/2020 4:16 AM, Thomas Gleixner wrote:
> This is the second version of providing a base to support device MSI (non
> PCI based) and on top of that support for IMS (Interrupt Message Storm)
> based devices in a halfways architecture independent way.
>
> The first version can be found here:
>
>      https://lore.kernel.org/r/20200821002424.119492231@linutronix.de
>
> It's still a mixed bag of bug fixes, cleanups and general improvements
> which are worthwhile independent of device MSI.
>
> There are quite a bunch of issues to solve:
>
>    - X86 does not use the device::msi_domain pointer for historical reasons
>      and due to XEN, which makes it impossible to create an architecture
>      agnostic device MSI infrastructure.
>
>    - X86 has it's own msi_alloc_info data type which is pointlessly
>      different from the generic version and does not allow to share code.
>
>    - The logic of composing MSI messages in an hierarchy is busted at the
>      core level and of course some (x86) drivers depend on that.
>
>    - A few minor shortcomings as usual
>
> This series addresses that in several steps:
>
>   1) Accidental bug fixes
>
>        iommu/amd: Prevent NULL pointer dereference
>
>   2) Janitoring
>
>        x86/init: Remove unused init ops
>        PCI: vmd: Dont abuse vector irqomain as parent
>        x86/msi: Remove pointless vcpu_affinity callback
>
>   3) Sanitizing the composition of MSI messages in a hierarchy
>   
>        genirq/chip: Use the first chip in irq_chip_compose_msi_msg()
>        x86/msi: Move compose message callback where it belongs
>
>   4) Simplification of the x86 specific interrupt allocation mechanism
>
>        x86/irq: Rename X86_IRQ_ALLOC_TYPE_MSI* to reflect PCI dependency
>        x86/irq: Add allocation type for parent domain retrieval
>        iommu/vt-d: Consolidate irq domain getter
>        iommu/amd: Consolidate irq domain getter
>        iommu/irq_remapping: Consolidate irq domain lookup
>
>   5) Consolidation of the X86 specific interrupt allocation mechanism to be as close
>      as possible to the generic MSI allocation mechanism which allows to get rid
>      of quite a bunch of x86'isms which are pointless
>
>        x86/irq: Prepare consolidation of irq_alloc_info
>        x86/msi: Consolidate HPET allocation
>        x86/ioapic: Consolidate IOAPIC allocation
>        x86/irq: Consolidate DMAR irq allocation
>        x86/irq: Consolidate UV domain allocation
>        PCI/MSI: Rework pci_msi_domain_calc_hwirq()
>        x86/msi: Consolidate MSI allocation
>        x86/msi: Use generic MSI domain ops
>
>    6) x86 specific cleanups to remove the dependency on arch_*_msi_irqs()
>
>        x86/irq: Move apic_post_init() invocation to one place
>        x86/pci: Reducde #ifdeffery in PCI init code
>        x86/irq: Initialize PCI/MSI domain at PCI init time
>        irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI
>        PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI
>        PCI/MSI: Provide pci_dev_has_special_msi_domain() helper
>        x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init()
>        x86/xen: Rework MSI teardown
>        x86/xen: Consolidate XEN-MSI init
>        irqdomain/msi: Allow to override msi_domain_alloc/free_irqs()
>        x86/xen: Wrap XEN MSI management into irqdomain
>        iommm/vt-d: Store irq domain in struct device
>        iommm/amd: Store irq domain in struct device
>        x86/pci: Set default irq domain in pcibios_add_device()
>        PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
>        x86/irq: Cleanup the arch_*_msi_irqs() leftovers
>        x86/irq: Make most MSI ops XEN private
>        iommu/vt-d: Remove domain search for PCI/MSI[X]
>        iommu/amd: Remove domain search for PCI/MSI
>
>    7) X86 specific preparation for device MSI
>
>        x86/irq: Add DEV_MSI allocation type
>        x86/msi: Rename and rework pci_msi_prepare() to cover non-PCI MSI
>
>    8) Generic device MSI infrastructure
>        platform-msi: Provide default irq_chip:: Ack
>        genirq/proc: Take buslock on affinity write
>        genirq/msi: Provide and use msi_domain_set_default_info_flags()
>        platform-msi: Add device MSI infrastructure
>        irqdomain/msi: Provide msi_alloc/free_store() callbacks
>
>    9) POC of IMS (Interrupt Message Storm) irq domain and irqchip
>       implementations for both device array and queue storage.
>
>        irqchip: Add IMS (Interrupt Message Storm) driver - NOT FOR MERGING
>
> Changes vs. V1:
>
>     - Addressed various review comments and addressed the 0day fallout.
>       - Corrected the XEN logic (Jürgen)
>       - Make the arch fallback in PCI/MSI opt-in not opt-out (Bjorn)
>
>     - Fixed the compose MSI message inconsistency
>
>     - Ensure that the necessary flags are set for device SMI
>
>     - Make the irq bus logic work for affinity setting to prepare
>       support for IMS storage in queue memory. It turned out to be
>       less scary than I feared.
>
>     - Remove leftovers in iommu/intel|amd
>
>     - Reworked the IMS POC driver to cover queue storage so Jason can have a
>       look whether that fits the needs of MLX devices.
>
> The whole lot is also available from git:
>
>     git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git device-msi
>
> This has been tested on Intel/AMD/KVM but lacks testing on:
>
>      - HYPERV (-ENODEV)
>      - VMD enabled systems (-ENODEV)
>      - XEN (-ENOCLUE)
>      - IMS (-ENODEV)
>
>      - Any non-X86 code which might depend on the broken compose MSI message
>        logic. Marc excpects not much fallout, but agrees that we need to fix
>        it anyway.
>
> #1 - #3 should be applied unconditionally for obvious reasons
> #4 - #6 are wortwhile cleanups which should be done independent of device MSI
>
> #7 - #8 look promising to cleanup the platform MSI implementation
>       	independent of #8, but I neither had cycles nor the stomach to
>       	tackle that.
>
> #9	is obviously just for the folks interested in IMS
>
> Thanks,
>
> 	tglx

I see that the tip tree (as of 9/29) has most of these patches but 
notice that the DEV_MSI related patches

haven't made it. I have tested the tip tree(x86/irq branch) with your 
DEV_MSI infra patches and our IMS

patches with the IDXD driver and was wondering if we should push out 
those patches as part of our patchset?

Thanks,

Megha


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI
  2020-09-29 23:03 ` [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Dey, Megha
@ 2020-09-30  6:41   ` Thomas Gleixner
  2020-09-30 11:43     ` Jason Gunthorpe
  0 siblings, 1 reply; 112+ messages in thread
From: Thomas Gleixner @ 2020-09-30  6:41 UTC (permalink / raw)
  To: Dey, Megha, LKML
  Cc: x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jason Gunthorpe, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams, dave.jiang,
	ravi.v.shankar

On Tue, Sep 29 2020 at 16:03, Megha Dey wrote:
> On 8/26/2020 4:16 AM, Thomas Gleixner wrote:
>> #9	is obviously just for the folks interested in IMS
>>
>
> I see that the tip tree (as of 9/29) has most of these patches but 
> notice that the DEV_MSI related patches
>
> haven't made it. I have tested the tip tree(x86/irq branch) with your
> DEV_MSI infra patches and our IMS patches with the IDXD driver and was

Your IMS patches? Why do you need something special again?

> wondering if we should push out those patches as part of our patchset?

As I don't have any hardware to test that, I was waiting for you and
Jason to confirm that this actually works for the two different IMS
implementations.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI
  2020-09-30  6:41   ` Thomas Gleixner
@ 2020-09-30 11:43     ` Jason Gunthorpe
  2020-09-30 15:20       ` Thomas Gleixner
  0 siblings, 1 reply; 112+ messages in thread
From: Jason Gunthorpe @ 2020-09-30 11:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dey, Megha, LKML, x86, Joerg Roedel, iommu, linux-hyperv,
	Haiyang Zhang, Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Dave Jiang, Alex Williamson, Jacob Pan,
	Baolu Lu, Kevin Tian, Dan Williams, ravi.v.shankar

On Wed, Sep 30, 2020 at 08:41:48AM +0200, Thomas Gleixner wrote:
> On Tue, Sep 29 2020 at 16:03, Megha Dey wrote:
> > On 8/26/2020 4:16 AM, Thomas Gleixner wrote:
> >> #9	is obviously just for the folks interested in IMS
> >>
> >
> > I see that the tip tree (as of 9/29) has most of these patches but 
> > notice that the DEV_MSI related patches
> >
> > haven't made it. I have tested the tip tree(x86/irq branch) with your
> > DEV_MSI infra patches and our IMS patches with the IDXD driver and was
> 
> Your IMS patches? Why do you need something special again?
> 
> > wondering if we should push out those patches as part of our patchset?
> 
> As I don't have any hardware to test that, I was waiting for you and
> Jason to confirm that this actually works for the two different IMS
> implementations.

How urgently do you need this? The code looked good from what I
understood. It will be a while before we have all the parts to send an
actual patch though.

We might be able to put together a mockup just to prove it

Jason

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 24/46] PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI
  2020-08-31 14:39   ` Jason Gunthorpe
@ 2020-09-30 12:45     ` Derrick, Jonathan
  2020-09-30 12:57       ` Jason Gunthorpe
  0 siblings, 1 reply; 112+ messages in thread
From: Derrick, Jonathan @ 2020-09-30 12:45 UTC (permalink / raw)
  To: tglx, jgg
  Cc: Williams, Dan J, sivanich, wei.liu, haiyangz, Dey, Megha, Lu,
	Baolu, Jiang, Dave, Tian, Kevin, jgross, kys, sstabellini,
	linux-kernel, x86, rafael, xen-devel, iommu, maz, bhelgaas,
	linux-pci, konrad.wilk, alex.williamson, steve.wahl,
	boris.ostrovsky, gregkh, rja, joro, sthemmin, Pan, Jacob jun,
	lorenzo.pieralisi, linux-hyperv, baolu.lu

Hi Jason

On Mon, 2020-08-31 at 11:39 -0300, Jason Gunthorpe wrote:
> On Wed, Aug 26, 2020 at 01:16:52PM +0200, Thomas Gleixner wrote:
> > From: Thomas Gleixner <tglx@linutronix.de>
> > 
> > Devices on the VMD bus use their own MSI irq domain, but it is not
> > distinguishable from regular PCI/MSI irq domains. This is required
> > to exclude VMD devices from getting the irq domain pointer set by
> > interrupt remapping.
> > 
> > Override the default bus token.
> > 
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> >  drivers/pci/controller/vmd.c |    6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > +++ b/drivers/pci/controller/vmd.c
> > @@ -579,6 +579,12 @@ static int vmd_enable_domain(struct vmd_
> >  		return -ENODEV;
> >  	}
> >  
> > +	/*
> > +	 * Override the irq domain bus token so the domain can be distinguished
> > +	 * from a regular PCI/MSI domain.
> > +	 */
> > +	irq_domain_update_bus_token(vmd->irq_domain, DOMAIN_BUS_VMD_MSI);
> > +
> 
> Having the non-transparent-bridge hold a MSI table and
> multiplex/de-multiplex IRQs looks like another good use case for
> something close to pci_subdevice_msi_create_irq_domain()?
> 
> If each device could have its own internal MSI-X table programmed
> properly things would work alot better. Disable capture/remap of the
> MSI range in the NTB.
We can disable the capture and remap in newer devices so we don't even
need the irq domain. Legacy VMD will automatically remap based on the
APIC dest bits in the MSI address.

I would however like to determine if the MSI data bits could be used
for individual devices to have the IRQ domain subsystem demultiplex the
virq from that and eliminate the IRQ list iteration.

A concern I have about that scheme is virtualization as I think those
bits are used to route to the virtual vector.

> 
> Then each device could have a proper non-multiplexed interrupt
> delivered to the CPU.. Affinity would work properly, no need to share
> IRQs (eg vmd_irq() goes away)/etc.
> 
> Something for the VMD maintainers to think about at least.
> 
> As I hear more about NTB these days a full MSI scheme for NTB seems
> interesting to have in the PCI-E core code..
> 
> Jason



^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 24/46] PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI
  2020-09-30 12:45     ` Derrick, Jonathan
@ 2020-09-30 12:57       ` Jason Gunthorpe
  2020-09-30 13:08         ` Derrick, Jonathan
  0 siblings, 1 reply; 112+ messages in thread
From: Jason Gunthorpe @ 2020-09-30 12:57 UTC (permalink / raw)
  To: Derrick, Jonathan
  Cc: tglx, Williams, Dan J, sivanich, wei.liu, haiyangz, Dey, Megha,
	Lu, Baolu, Jiang, Dave, Tian, Kevin, jgross, kys, sstabellini,
	linux-kernel, x86, rafael, xen-devel, iommu, maz, bhelgaas,
	linux-pci, konrad.wilk, alex.williamson, steve.wahl,
	boris.ostrovsky, gregkh, rja, joro, sthemmin, Pan, Jacob jun,
	lorenzo.pieralisi, linux-hyperv, baolu.lu

On Wed, Sep 30, 2020 at 12:45:30PM +0000, Derrick, Jonathan wrote:
> Hi Jason
> 
> On Mon, 2020-08-31 at 11:39 -0300, Jason Gunthorpe wrote:
> > On Wed, Aug 26, 2020 at 01:16:52PM +0200, Thomas Gleixner wrote:
> > > From: Thomas Gleixner <tglx@linutronix.de>
> > > 
> > > Devices on the VMD bus use their own MSI irq domain, but it is not
> > > distinguishable from regular PCI/MSI irq domains. This is required
> > > to exclude VMD devices from getting the irq domain pointer set by
> > > interrupt remapping.
> > > 
> > > Override the default bus token.
> > > 
> > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > > Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> > >  drivers/pci/controller/vmd.c |    6 ++++++
> > >  1 file changed, 6 insertions(+)
> > > 
> > > +++ b/drivers/pci/controller/vmd.c
> > > @@ -579,6 +579,12 @@ static int vmd_enable_domain(struct vmd_
> > >  		return -ENODEV;
> > >  	}
> > >  
> > > +	/*
> > > +	 * Override the irq domain bus token so the domain can be distinguished
> > > +	 * from a regular PCI/MSI domain.
> > > +	 */
> > > +	irq_domain_update_bus_token(vmd->irq_domain, DOMAIN_BUS_VMD_MSI);
> > > +
> > 
> > Having the non-transparent-bridge hold a MSI table and
> > multiplex/de-multiplex IRQs looks like another good use case for
> > something close to pci_subdevice_msi_create_irq_domain()?
> > 
> > If each device could have its own internal MSI-X table programmed
> > properly things would work alot better. Disable capture/remap of the
> > MSI range in the NTB.

> We can disable the capture and remap in newer devices so we don't even
> need the irq domain.

You'd still need an irq domain, it just comes from
pci_subdevice_msi_create_irq_domain() instead of internal to this
driver.

> I would however like to determine if the MSI data bits could be used
> for individual devices to have the IRQ domain subsystem demultiplex the
> virq from that and eliminate the IRQ list iteration.

Yes, exactly. This new pci_subdevice_msi_create_irq_domain() creates
*proper* fully functional interrupts, no need for any IRQ handler in
this driver.

> A concern I have about that scheme is virtualization as I think those
> bits are used to route to the virtual vector.

It should be fine with this patch series, consult with Megha how
virtualization is working with IDXD/etc

Jason

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 24/46] PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI
  2020-09-30 12:57       ` Jason Gunthorpe
@ 2020-09-30 13:08         ` Derrick, Jonathan
  2020-09-30 18:47           ` Jason Gunthorpe
  0 siblings, 1 reply; 112+ messages in thread
From: Derrick, Jonathan @ 2020-09-30 13:08 UTC (permalink / raw)
  To: jgg
  Cc: tglx, Williams, Dan J, sivanich, jgross, haiyangz, Dey, Megha,
	kys, Jiang, Dave, Tian, Kevin, wei.liu, Lu, Baolu, sstabellini,
	linux-kernel, x86, rafael, xen-devel, iommu, maz, bhelgaas,
	linux-pci, konrad.wilk, alex.williamson, steve.wahl,
	boris.ostrovsky, gregkh, rja, joro, Pan, Jacob jun, sthemmin,
	linux-hyperv, lorenzo.pieralisi, baolu.lu

+Megha

On Wed, 2020-09-30 at 09:57 -0300, Jason Gunthorpe wrote:
> On Wed, Sep 30, 2020 at 12:45:30PM +0000, Derrick, Jonathan wrote:
> > Hi Jason
> > 
> > On Mon, 2020-08-31 at 11:39 -0300, Jason Gunthorpe wrote:
> > > On Wed, Aug 26, 2020 at 01:16:52PM +0200, Thomas Gleixner wrote:
> > > > From: Thomas Gleixner <tglx@linutronix.de>
> > > > 
> > > > Devices on the VMD bus use their own MSI irq domain, but it is not
> > > > distinguishable from regular PCI/MSI irq domains. This is required
> > > > to exclude VMD devices from getting the irq domain pointer set by
> > > > interrupt remapping.
> > > > 
> > > > Override the default bus token.
> > > > 
> > > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > > > Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> > > >  drivers/pci/controller/vmd.c |    6 ++++++
> > > >  1 file changed, 6 insertions(+)
> > > > 
> > > > +++ b/drivers/pci/controller/vmd.c
> > > > @@ -579,6 +579,12 @@ static int vmd_enable_domain(struct vmd_
> > > >  		return -ENODEV;
> > > >  	}
> > > >  
> > > > +	/*
> > > > +	 * Override the irq domain bus token so the domain can be distinguished
> > > > +	 * from a regular PCI/MSI domain.
> > > > +	 */
> > > > +	irq_domain_update_bus_token(vmd->irq_domain, DOMAIN_BUS_VMD_MSI);
> > > > +
> > > 
> > > Having the non-transparent-bridge hold a MSI table and
> > > multiplex/de-multiplex IRQs looks like another good use case for
> > > something close to pci_subdevice_msi_create_irq_domain()?
> > > 
> > > If each device could have its own internal MSI-X table programmed
> > > properly things would work alot better. Disable capture/remap of the
> > > MSI range in the NTB.
> > We can disable the capture and remap in newer devices so we don't even
> > need the irq domain.
> 
> You'd still need an irq domain, it just comes from
> pci_subdevice_msi_create_irq_domain() instead of internal to this
> driver.
I have this set which disables remapping and bypasses the creation of
the IRQ domain:
https://patchwork.ozlabs.org/project/linux-pci/list/?series=192936

It allows the end-devices like NVMe to directly manager their own
interrupts and eliminates the VMD interrupt completely. The only quirk
was that kernel has to program IRTE with the VMD device ID as it still
remaps Requester-ID from subdevices.

> 
> > I would however like to determine if the MSI data bits could be used
> > for individual devices to have the IRQ domain subsystem demultiplex the
> > virq from that and eliminate the IRQ list iteration.
> 
> Yes, exactly. This new pci_subdevice_msi_create_irq_domain() creates
> *proper* fully functional interrupts, no need for any IRQ handler in
> this driver.
> 
> > A concern I have about that scheme is virtualization as I think those
> > bits are used to route to the virtual vector.
> 
> It should be fine with this patch series, consult with Megha how
> virtualization is working with IDXD/etc
> 
> Jason

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI
  2020-09-30 11:43     ` Jason Gunthorpe
@ 2020-09-30 15:20       ` Thomas Gleixner
  2020-09-30 17:25         ` Dey, Megha
  0 siblings, 1 reply; 112+ messages in thread
From: Thomas Gleixner @ 2020-09-30 15:20 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Dey, Megha, LKML, x86, Joerg Roedel, iommu, linux-hyperv,
	Haiyang Zhang, Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Dave Jiang, Alex Williamson, Jacob Pan,
	Baolu Lu, Kevin Tian, Dan Williams, ravi.v.shankar

On Wed, Sep 30 2020 at 08:43, Jason Gunthorpe wrote:
> On Wed, Sep 30, 2020 at 08:41:48AM +0200, Thomas Gleixner wrote:
>> On Tue, Sep 29 2020 at 16:03, Megha Dey wrote:
>> > On 8/26/2020 4:16 AM, Thomas Gleixner wrote:
>> >> #9	is obviously just for the folks interested in IMS
>> >>
>> >
>> > I see that the tip tree (as of 9/29) has most of these patches but 
>> > notice that the DEV_MSI related patches
>> >
>> > haven't made it. I have tested the tip tree(x86/irq branch) with your
>> > DEV_MSI infra patches and our IMS patches with the IDXD driver and was
>> 
>> Your IMS patches? Why do you need something special again?
>> 
>> > wondering if we should push out those patches as part of our patchset?
>> 
>> As I don't have any hardware to test that, I was waiting for you and
>> Jason to confirm that this actually works for the two different IMS
>> implementations.
>
> How urgently do you need this? The code looked good from what I
> understood. It will be a while before we have all the parts to send an
> actual patch though.

I personally do not need it at all :) Megha might have different
thoughts... 

> We might be able to put together a mockup just to prove it

If that makes Megha's stuff going that would of course be appreciated,
but we can defer the IMS_QUEUE part for later. It's orthogonal to the
IMS_ARRAY stuff.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI
  2020-09-30 15:20       ` Thomas Gleixner
@ 2020-09-30 17:25         ` Dey, Megha
  2020-09-30 18:11           ` Thomas Gleixner
  0 siblings, 1 reply; 112+ messages in thread
From: Dey, Megha @ 2020-09-30 17:25 UTC (permalink / raw)
  To: Thomas Gleixner, Jason Gunthorpe
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Dave Jiang, Alex Williamson, Jacob Pan,
	Baolu Lu, Kevin Tian, Dan Williams, ravi.v.shankar

Hi Thomas/Jason,

On 9/30/2020 8:20 AM, Thomas Gleixner wrote:
> On Wed, Sep 30 2020 at 08:43, Jason Gunthorpe wrote:
>> On Wed, Sep 30, 2020 at 08:41:48AM +0200, Thomas Gleixner wrote:
>>> On Tue, Sep 29 2020 at 16:03, Megha Dey wrote:
>>>> On 8/26/2020 4:16 AM, Thomas Gleixner wrote:
>>>>> #9	is obviously just for the folks interested in IMS
>>>>>
>>>> I see that the tip tree (as of 9/29) has most of these patches but
>>>> notice that the DEV_MSI related patches
>>>>
>>>> haven't made it. I have tested the tip tree(x86/irq branch) with your
>>>> DEV_MSI infra patches and our IMS patches with the IDXD driver and was
>>> Your IMS patches? Why do you need something special again?

By IMS patches, I meant your IMS driver patch that was updated (as it 
was untested, it had some compile

errors and we removed the IMS_QUEUE parts) :

https://lore.kernel.org/lkml/160021246221.67751.16280230469654363209.stgit@djiang5-desk3.ch.intel.com/

and some iommu related changes required by IMS.

https://lore.kernel.org/lkml/160021246905.67751.1674517279122764758.stgit@djiang5-desk3.ch.intel.com/

The whole patchset can be found here:

https://lore.kernel.org/lkml/f4a085f1-f6de-2539-12fe-c7308d243a4a@intel.com/

It would be great if you could review the IMS patches :)

>>>
>>>> wondering if we should push out those patches as part of our patchset?
>>> As I don't have any hardware to test that, I was waiting for you and
>>> Jason to confirm that this actually works for the two different IMS
>>> implementations.
>> How urgently do you need this? The code looked good from what I
>> understood. It will be a while before we have all the parts to send an
>> actual patch though.
> I personally do not need it at all :) Megha might have different
> thoughts...

I have tested these patches and it works fine (I had to add a couple of 
EXPORT_SYMBOLS).

We were hoping to get IMS in the 5.10 merge window :)

>
>> We might be able to put together a mockup just to prove it
> If that makes Megha's stuff going that would of course be appreciated,
> but we can defer the IMS_QUEUE part for later. It's orthogonal to the
> IMS_ARRAY stuff.

In our patch series, we have removed the IMS_QUEUE stuff and retained 
only the IMS_ARRAY parts

as that was sufficient for us.

>
> Thanks,
>
>          tglx

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI
  2020-09-30 17:25         ` Dey, Megha
@ 2020-09-30 18:11           ` Thomas Gleixner
  0 siblings, 0 replies; 112+ messages in thread
From: Thomas Gleixner @ 2020-09-30 18:11 UTC (permalink / raw)
  To: Dey, Megha, Jason Gunthorpe
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Dave Jiang, Alex Williamson, Jacob Pan,
	Baolu Lu, Kevin Tian, Dan Williams, ravi.v.shankar

Megha,

On Wed, Sep 30 2020 at 10:25, Megha Dey wrote:
> On 9/30/2020 8:20 AM, Thomas Gleixner wrote:
>>>> Your IMS patches? Why do you need something special again?
>
> By IMS patches, I meant your IMS driver patch that was updated (as it 
> was untested, it had some compile errors and we removed the IMS_QUEUE
> parts) :

Ok.

> The whole patchset can be found here:
>
> https://lore.kernel.org/lkml/f4a085f1-f6de-2539-12fe-c7308d243a4a@intel.com/
>
> It would be great if you could review the IMS patches :)

It somehow slipped through the cracks. I'll have a look.

> We were hoping to get IMS in the 5.10 merge window :)

Hope dies last, right?

>>> We might be able to put together a mockup just to prove it
>> If that makes Megha's stuff going that would of course be appreciated,
>> but we can defer the IMS_QUEUE part for later. It's orthogonal to the
>> IMS_ARRAY stuff.
>
> In our patch series, we have removed the IMS_QUEUE stuff and retained 
> only the IMS_ARRAY parts > as that was sufficient for us.

That works. We can add that back when Jason has his puzzle pieces
sorted.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 24/46] PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI
  2020-09-30 13:08         ` Derrick, Jonathan
@ 2020-09-30 18:47           ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2020-09-30 18:47 UTC (permalink / raw)
  To: Derrick, Jonathan
  Cc: tglx, Williams, Dan J, sivanich, jgross, haiyangz, Dey, Megha,
	kys, Jiang, Dave, Tian, Kevin, wei.liu, Lu, Baolu, sstabellini,
	linux-kernel, x86, rafael, xen-devel, iommu, maz, bhelgaas,
	linux-pci, konrad.wilk, alex.williamson, steve.wahl,
	boris.ostrovsky, gregkh, rja, joro, Pan, Jacob jun, sthemmin,
	linux-hyperv, lorenzo.pieralisi, baolu.lu

On Wed, Sep 30, 2020 at 01:08:27PM +0000, Derrick, Jonathan wrote:
> +Megha
> 
> On Wed, 2020-09-30 at 09:57 -0300, Jason Gunthorpe wrote:
> > On Wed, Sep 30, 2020 at 12:45:30PM +0000, Derrick, Jonathan wrote:
> > > Hi Jason
> > > 
> > > On Mon, 2020-08-31 at 11:39 -0300, Jason Gunthorpe wrote:
> > > > On Wed, Aug 26, 2020 at 01:16:52PM +0200, Thomas Gleixner wrote:
> > > > > From: Thomas Gleixner <tglx@linutronix.de>
> > > > > 
> > > > > Devices on the VMD bus use their own MSI irq domain, but it is not
> > > > > distinguishable from regular PCI/MSI irq domains. This is required
> > > > > to exclude VMD devices from getting the irq domain pointer set by
> > > > > interrupt remapping.
> > > > > 
> > > > > Override the default bus token.
> > > > > 
> > > > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > > > > Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> > > > >  drivers/pci/controller/vmd.c |    6 ++++++
> > > > >  1 file changed, 6 insertions(+)
> > > > > 
> > > > > +++ b/drivers/pci/controller/vmd.c
> > > > > @@ -579,6 +579,12 @@ static int vmd_enable_domain(struct vmd_
> > > > >  		return -ENODEV;
> > > > >  	}
> > > > >  
> > > > > +	/*
> > > > > +	 * Override the irq domain bus token so the domain can be distinguished
> > > > > +	 * from a regular PCI/MSI domain.
> > > > > +	 */
> > > > > +	irq_domain_update_bus_token(vmd->irq_domain, DOMAIN_BUS_VMD_MSI);
> > > > > +
> > > > 
> > > > Having the non-transparent-bridge hold a MSI table and
> > > > multiplex/de-multiplex IRQs looks like another good use case for
> > > > something close to pci_subdevice_msi_create_irq_domain()?
> > > > 
> > > > If each device could have its own internal MSI-X table programmed
> > > > properly things would work alot better. Disable capture/remap of the
> > > > MSI range in the NTB.
> > > We can disable the capture and remap in newer devices so we don't even
> > > need the irq domain.
> > 
> > You'd still need an irq domain, it just comes from
> > pci_subdevice_msi_create_irq_domain() instead of internal to this
> > driver.
> I have this set which disables remapping and bypasses the creation of
> the IRQ domain:
> https://patchwork.ozlabs.org/project/linux-pci/list/?series=192936

After Thomas's series VMD needs to supply a hierarchical IRQ domain to
get the correct PCI originator. This instead of the x86 patch in that
series. I think that domain should be created by
pci_subdevice_msi_create_irq_domain(), at least I'd start there.

Jason

^ permalink raw reply	[flat|nested] 112+ messages in thread

* REGRESSION: Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI
  2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
                   ` (52 preceding siblings ...)
  2020-09-29 23:03 ` [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Dey, Megha
@ 2020-11-12 12:55 ` Jason Gunthorpe
  53 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2020-11-12 12:55 UTC (permalink / raw)
  To: Thomas Gleixner, Ziyad Atiyyeh, Itay Aveksis, Moshe Shemesh
  Cc: LKML, x86, Joerg Roedel, iommu, linux-hyperv, Haiyang Zhang,
	Jon Derrick, Lu Baolu, Wei Liu, K. Y. Srinivasan,
	Stephen Hemminger, Steve Wahl, Dimitri Sivanich, Russ Anderson,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Konrad Rzeszutek Wilk, xen-devel, Juergen Gross, Boris Ostrovsky,
	Stefano Stabellini, Marc Zyngier, Greg Kroah-Hartman,
	Rafael J. Wysocki, Megha Dey, Dave Jiang, Alex Williamson,
	Jacob Pan, Baolu Lu, Kevin Tian, Dan Williams

On Wed, Aug 26, 2020 at 01:16:28PM +0200, Thomas Gleixner wrote:
> This is the second version of providing a base to support device MSI (non
> PCI based) and on top of that support for IMS (Interrupt Message Storm)
> based devices in a halfways architecture independent way.

Hi Thomas,

Our test team has been struggling with a regression on bare metal
SRIOV VFs since -rc1 that they were able to bisect to this series

This commit tests good:

5712c3ed549e ("Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc")

This commit tests bad:

981aa1d366bf ("PCI: MSI: Fix Kconfig dependencies for PCI_MSI_ARCH_FALLBACKS")

They were unable to bisect further into the series because some of the
interior commits don't boot :(

When we try to load the mlx5 driver on a bare metal VF it gets this:

[Thu Oct 22 08:54:51 2020] DMAR: DRHD: handling fault status reg 2
[Thu Oct 22 08:54:51 2020] DMAR: [INTR-REMAP] Request device [42:00.2] fault index 1600 [fault reason 37] Blocked a compatibility format interrupt request
[Thu Oct 22 08:55:04 2020] mlx5_core 0000:42:00.1 eth4: Link down
[Thu Oct 22 08:55:11 2020] mlx5_core 0000:42:00.1 eth4: Link up
[Thu Oct 22 08:55:54 2020] mlx5_core 0000:42:00.2: mlx5_cmd_eq_recover:264:(pid 3390): Recovered 1 EQEs on cmd_eq
[Thu Oct 22 08:55:54 2020] mlx5_core 0000:42:00.2: wait_func_handle_exec_timeout:1051:(pid 3390): cmd0: CREATE_EQ(0×301) recovered after timeout
[Thu Oct 22 08:55:54 2020] DMAR: DRHD: handling fault status reg 102
[Thu Oct 22 08:55:54 2020] DMAR: [INTR-REMAP] Request device [42:00.2] fault index 1600 [fault reason 37] Blocked a compatibility format interrupt request

If you have any idea Ziyad and Itay can run any debugging you like.

I suppose it is because this series is handing out compatability
addr/data pairs while the IOMMU is setup to only accept remap ones
from SRIOV VFs?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 30/46] x86/xen: Wrap XEN MSI management into irqdomain
  2020-08-26 11:16 ` [patch V2 30/46] x86/xen: Wrap XEN MSI management into irqdomain Thomas Gleixner
@ 2023-01-15 14:12   ` David Woodhouse
  2023-01-15 20:27     ` David Woodhouse
  0 siblings, 1 reply; 112+ messages in thread
From: David Woodhouse @ 2023-01-15 14:12 UTC (permalink / raw)
  To: Thomas Gleixner, LKML, xen-devel
  Cc: Dimitri Sivanich, linux-hyperv, Steve Wahl, linux-pci,
	K. Y. Srinivasan, Dan Williams, Wei Liu, Stephen Hemminger,
	Baolu Lu, Marc Zyngier, x86, Jason Gunthorpe, Megha Dey,
	xen-devel, Kevin Tian, Konrad Rzeszutek Wilk, Haiyang Zhang,
	Alex Williamson, Stefano Stabellini, Bjorn Helgaas, Dave Jiang,
	Boris Ostrovsky, Jon Derrick, Juergen Gross, Russ Anderson,
	Greg Kroah-Hartman, iommu, Jacob Pan, Rafael J. Wysocki

[-- Attachment #1: Type: text/plain, Size: 1827 bytes --]

On Wed, 2020-08-26 at 13:16 +0200, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> To allow utilizing the irq domain pointer in struct device it is necessary
> to make XEN/MSI irq domain compatible.
> 
> While the right solution would be to truly convert XEN to irq domains, this
> is an exercise which is not possible for mere mortals with limited XENology.
> 
> Provide a plain irqdomain wrapper around XEN. While this is blatant
> violation of the irqdomain design, it's the only solution for a XEN igorant
> person to make progress on the issue which triggered this change.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Acked-by: Juergen Gross <jgross@suse.com>

I think it broke MSI-X support, because xen_pci_msi_domain_info is
lacking a .flags = MSI_FLAGS_PCI_MSIX?

> ---
> Note: This is completely untested, but it compiles so it must be perfect.


I'm working on making it simple for you to test that, by hosting Xen
HVM guests natively in qemu (under KVM¹). 

But I'm absolutely not going to try hacking on both guest and host side
at the same time when I'm trying to ensure compatibility — that way
lies madness.

So for now I'm going to test qemu with older kernels, and maybe someone
(Jürgen}? can test MSI-X to PIRQ support under real Xen?) FWIW if I add
the missing MSI_FLAGS_PCI_MSIX flag then under my qemu I get:

 38:       3180          0  xen-pirq    -msi-x     ens4-rx-0
 39:          0       3610  xen-pirq    -msi-x     ens4-tx-0
 40:          1          0  xen-pirq    -msi-x     ens4

But without the flags I get:

[    8.464212] e1000e 0000:00:04.0 ens4: Failed to initialize MSI interrupts.  Falling back to legacy interrupts.

¹ https://lore.kernel.org/qemu-devel/20230110122042.1562155-1-dwmw2@infradead.org/

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch V2 30/46] x86/xen: Wrap XEN MSI management into irqdomain
  2023-01-15 14:12   ` David Woodhouse
@ 2023-01-15 20:27     ` David Woodhouse
  0 siblings, 0 replies; 112+ messages in thread
From: David Woodhouse @ 2023-01-15 20:27 UTC (permalink / raw)
  To: Thomas Gleixner, LKML, xen-devel
  Cc: Dimitri Sivanich, linux-hyperv, Steve Wahl, linux-pci,
	K. Y. Srinivasan, Dan Williams, Wei Liu, Stephen Hemminger,
	Baolu Lu, Marc Zyngier, x86, Jason Gunthorpe, Megha Dey,
	xen-devel, Kevin Tian, Konrad Rzeszutek Wilk, Haiyang Zhang,
	Alex Williamson, Stefano Stabellini, Bjorn Helgaas, Dave Jiang,
	Boris Ostrovsky, Jon Derrick, Juergen Gross, Russ Anderson,
	Greg Kroah-Hartman, iommu, Jacob Pan, Rafael J. Wysocki

[-- Attachment #1: Type: text/plain, Size: 2620 bytes --]

On Sun, 2023-01-15 at 14:12 +0000, David Woodhouse wrote:
> On Wed, 2020-08-26 at 13:16 +0200, Thomas Gleixner wrote:
> > From: Thomas Gleixner <tglx@linutronix.de>
> > 
> > To allow utilizing the irq domain pointer in struct device it is necessary
> > to make XEN/MSI irq domain compatible.
> > 
> > While the right solution would be to truly convert XEN to irq domains, this
> > is an exercise which is not possible for mere mortals with limited XENology.
> > 
> > Provide a plain irqdomain wrapper around XEN. While this is blatant
> > violation of the irqdomain design, it's the only solution for a XEN igorant
> > person to make progress on the issue which triggered this change.
> > 
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > Acked-by: Juergen Gross <jgross@suse.com>
> 
> I think it broke MSI-X support, because xen_pci_msi_domain_info is
> lacking a .flags = MSI_FLAGS_PCI_MSIX?

Hm, I think it only actually *broke* with commit 99f3d27976 ("PCI/MSI:
Reject MSI-X early") from November last year. So 6.1 was OK and we have
time to fix it in 6.2.

Confirmed on real Xen at least that a Fedora 37 install with a 6.0.7
kernel works fine, then on upgrading to Rawhide's 6.2-rc3 it dies with 

[   41.498694] ena 0000:00:03.0 (unnamed net_device) (uninitialized): Failed to enable MSI-X. irq_cnt -524
[   41.498705] ena 0000:00:03.0: Can not reserve msix vectors
[   41.498712] ena 0000:00:03.0: Failed to enable and set the admin interrupts

> > ---
> > Note: This is completely untested, but it compiles so it must be perfect.
> 
> 
> I'm working on making it simple for you to test that, by hosting Xen
> HVM guests natively in qemu (under KVM¹). 
> 
> But I'm absolutely not going to try hacking on both guest and host side
> at the same time when I'm trying to ensure compatibility — that way
> lies madness.
> 
> So for now I'm going to test qemu with older kernels, and maybe someone
> (Jürgen}? can test MSI-X to PIRQ support under real Xen?) FWIW if I add
> the missing MSI_FLAGS_PCI_MSIX flag then under my qemu I get:
> 
>  38:       3180          0  xen-pirq    -msi-x     ens4-rx-0
>  39:          0       3610  xen-pirq    -msi-x     ens4-tx-0
>  40:          1          0  xen-pirq    -msi-x     ens4
> 
> But without the flags I get:
> 
> [    8.464212] e1000e 0000:00:04.0 ens4: Failed to initialize MSI interrupts.  Falling back to legacy interrupts.
> 
> ¹ https://lore.kernel.org/qemu-devel/20230110122042.1562155-1-dwmw2@infradead.org/


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 112+ messages in thread

end of thread, other threads:[~2023-01-15 20:28 UTC | newest]

Thread overview: 112+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-26 11:16 [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
2020-08-26 11:16 ` [patch V2 01/46] iommu/amd: Prevent NULL pointer dereference Thomas Gleixner
2020-08-27 14:57   ` Joerg Roedel
2020-08-26 11:16 ` [patch V2 02/46] x86/init: Remove unused init ops Thomas Gleixner
2020-08-26 11:16 ` [patch V2 03/46] PCI: vmd: Dont abuse vector irqomain as parent Thomas Gleixner
2020-08-26 11:16 ` [patch V2 04/46] genirq/chip: Use the first chip in irq_chip_compose_msi_msg() Thomas Gleixner
2020-08-26 19:50   ` Marc Zyngier
2020-08-26 21:19     ` Thomas Gleixner
2020-08-26 21:32       ` Marc Zyngier
2020-08-26 11:16 ` [patch V2 05/46] x86/msi: Move compose message callback where it belongs Thomas Gleixner
2020-08-26 11:16 ` [patch V2 06/46] x86/msi: Remove pointless vcpu_affinity callback Thomas Gleixner
2020-08-26 11:16 ` [patch V2 07/46] x86/irq: Rename X86_IRQ_ALLOC_TYPE_MSI* to reflect PCI dependency Thomas Gleixner
2020-08-26 11:16 ` [patch V2 08/46] x86/irq: Add allocation type for parent domain retrieval Thomas Gleixner
2020-08-26 11:16 ` [patch V2 09/46] iommu/vt-d: Consolidate irq domain getter Thomas Gleixner
2020-08-26 11:16 ` [patch V2 10/46] iommu/amd: " Thomas Gleixner
2020-08-26 11:16 ` [patch V2 11/46] iommu/irq_remapping: Consolidate irq domain lookup Thomas Gleixner
2020-08-26 11:16 ` [patch V2 12/46] x86/irq: Prepare consolidation of irq_alloc_info Thomas Gleixner
2020-08-26 11:16 ` [patch V2 13/46] x86/msi: Consolidate HPET allocation Thomas Gleixner
2020-08-26 11:16 ` [patch V2 14/46] x86/ioapic: Consolidate IOAPIC allocation Thomas Gleixner
2020-09-08 13:35   ` Wei Liu
2020-08-26 11:16 ` [patch V2 15/46] x86/irq: Consolidate DMAR irq allocation Thomas Gleixner
2020-08-26 16:50   ` Dey, Megha
2020-08-26 18:32     ` Thomas Gleixner
2020-08-26 20:50       ` Thomas Gleixner
2020-08-28  0:12         ` Dey, Megha
2020-08-26 11:16 ` [patch V2 16/46] x86/irq: Consolidate UV domain allocation Thomas Gleixner
2020-08-26 11:16 ` [patch V2 17/46] PCI/MSI: Rework pci_msi_domain_calc_hwirq() Thomas Gleixner
2020-08-26 20:24   ` Marc Zyngier
2020-08-26 11:16 ` [patch V2 18/46] x86/msi: Consolidate MSI allocation Thomas Gleixner
2020-09-08 13:36   ` Wei Liu
2020-08-26 11:16 ` [patch V2 19/46] x86/msi: Use generic MSI domain ops Thomas Gleixner
2020-08-26 20:21   ` Marc Zyngier
2020-08-26 20:43     ` Thomas Gleixner
2020-08-26 11:16 ` [patch V2 20/46] x86/irq: Move apic_post_init() invocation to one place Thomas Gleixner
2020-08-26 11:16 ` [patch V2 21/46] x86/pci: Reducde #ifdeffery in PCI init code Thomas Gleixner
2020-08-26 11:16 ` [patch V2 22/46] x86/irq: Initialize PCI/MSI domain at PCI init time Thomas Gleixner
2020-08-26 11:16 ` [patch V2 23/46] irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI Thomas Gleixner
2020-08-26 20:42   ` Marc Zyngier
2020-08-26 20:57     ` Derrick, Jonathan
2020-08-26 11:16 ` [patch V2 24/46] PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI Thomas Gleixner
2020-08-26 20:47   ` Marc Zyngier
2020-08-31 14:39   ` Jason Gunthorpe
2020-09-30 12:45     ` Derrick, Jonathan
2020-09-30 12:57       ` Jason Gunthorpe
2020-09-30 13:08         ` Derrick, Jonathan
2020-09-30 18:47           ` Jason Gunthorpe
2020-08-26 11:16 ` [patch V2 25/46] PCI/MSI: Provide pci_dev_has_special_msi_domain() helper Thomas Gleixner
2020-08-26 11:16 ` [patch V2 26/46] x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init() Thomas Gleixner
2020-08-26 11:16 ` [patch V2 27/46] x86/xen: Rework MSI teardown Thomas Gleixner
2020-08-27  7:46   ` Jürgen Groß
2020-08-26 11:16 ` [patch V2 28/46] x86/xen: Consolidate XEN-MSI init Thomas Gleixner
2020-08-27  7:47   ` Jürgen Groß
2020-08-26 11:16 ` [patch V2 29/46] irqdomain/msi: Allow to override msi_domain_alloc/free_irqs() Thomas Gleixner
2020-08-26 19:06   ` Marc Zyngier
2020-08-26 19:47     ` Thomas Gleixner
2020-08-26 21:33       ` Marc Zyngier
2020-08-28  0:24   ` Dey, Megha
2020-08-26 11:16 ` [patch V2 30/46] x86/xen: Wrap XEN MSI management into irqdomain Thomas Gleixner
2023-01-15 14:12   ` David Woodhouse
2023-01-15 20:27     ` David Woodhouse
2020-08-26 11:16 ` [patch V2 31/46] iommm/vt-d: Store irq domain in struct device Thomas Gleixner
2020-08-26 11:17 ` [patch V2 32/46] iommm/amd: " Thomas Gleixner
2020-08-26 11:17 ` [patch V2 33/46] x86/pci: Set default irq domain in pcibios_add_device() Thomas Gleixner
2020-08-26 11:17 ` [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable Thomas Gleixner
2020-08-26 15:53   ` Thomas Gleixner
2020-08-26 21:14   ` Marc Zyngier
2020-08-26 21:27     ` Thomas Gleixner
2020-08-27 18:20   ` Bjorn Helgaas
2020-08-28 11:21     ` Lorenzo Pieralisi
2020-08-28 12:19       ` Jason Gunthorpe
2020-08-28 12:47         ` Marc Zyngier
2020-08-28 12:54           ` Jason Gunthorpe
2020-08-28 13:52             ` Marc Zyngier
2020-08-28 18:29     ` Thomas Gleixner
2020-09-25 13:54   ` Qian Cai
2020-09-26 12:38     ` Vasily Gorbik
2020-09-28 10:11       ` Thomas Gleixner
2020-08-26 11:17 ` [patch V2 35/46] x86/irq: Cleanup the arch_*_msi_irqs() leftovers Thomas Gleixner
2020-08-26 11:17 ` [patch V2 36/46] x86/irq: Make most MSI ops XEN private Thomas Gleixner
2020-08-26 11:17 ` [patch V2 37/46] iommu/vt-d: Remove domain search for PCI/MSI[X] Thomas Gleixner
2020-08-26 11:17 ` [patch V2 38/46] iommu/amd: Remove domain search for PCI/MSI Thomas Gleixner
2020-08-26 11:17 ` [patch V2 39/46] x86/irq: Add DEV_MSI allocation type Thomas Gleixner
2020-08-26 11:17 ` [patch V2 40/46] x86/msi: Rename and rework pci_msi_prepare() to cover non-PCI MSI Thomas Gleixner
2020-08-26 11:17 ` [patch V2 41/46] platform-msi: Provide default irq_chip:: Ack Thomas Gleixner
2020-08-26 21:25   ` Marc Zyngier
2020-08-26 11:17 ` [patch V2 42/46] genirq/proc: Take buslock on affinity write Thomas Gleixner
2020-08-26 11:17 ` [patch V2 43/46] genirq/msi: Provide and use msi_domain_set_default_info_flags() Thomas Gleixner
2020-08-27  8:17   ` Marc Zyngier
2020-08-28 18:42     ` Thomas Gleixner
2020-08-26 11:17 ` [patch V2 44/46] platform-msi: Add device MSI infrastructure Thomas Gleixner
2020-08-26 11:17 ` [patch V2 45/46] irqdomain/msi: Provide msi_alloc/free_store() callbacks Thomas Gleixner
2020-08-26 11:17 ` [patch V2 46/46] irqchip: Add IMS (Interrupt Message Storm) driver - NOT FOR MERGING Thomas Gleixner
2020-08-31 14:45   ` Jason Gunthorpe
2020-08-28 11:41 ` [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Joerg Roedel
2020-08-31  0:51 ` Lu Baolu
2020-08-31  7:10   ` Thomas Gleixner
2020-08-31  7:29     ` Lu Baolu
2020-09-01  9:06 ` Boqun Feng
2020-09-03 16:35 ` Raj, Ashok
2020-09-03 18:12   ` Thomas Gleixner
2020-09-08  3:39 ` Russ Anderson
2020-09-25 15:29 ` Qian Cai
2020-09-25 15:49   ` Peter Zijlstra
2020-09-25 23:14     ` Thomas Gleixner
2020-09-27  8:46       ` [PATCH] x86/apic/msi: Unbreak DMAR and HPET MSI Thomas Gleixner
2020-09-29 23:03 ` [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI Dey, Megha
2020-09-30  6:41   ` Thomas Gleixner
2020-09-30 11:43     ` Jason Gunthorpe
2020-09-30 15:20       ` Thomas Gleixner
2020-09-30 17:25         ` Dey, Megha
2020-09-30 18:11           ` Thomas Gleixner
2020-11-12 12:55 ` REGRESSION: " Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).