KVM Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/5] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported
@ 2020-10-07 12:20 David Woodhouse
  2020-10-07 12:20 ` [PATCH 1/5] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
  2020-10-09 10:46 ` [PATCH v2 0/8] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
  0 siblings, 2 replies; 58+ messages in thread
From: David Woodhouse @ 2020-10-07 12:20 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel


[-- Attachment #1: Type: text/plain, Size: 1668 bytes --]

Splitting out the simpler parts of my previous patch set. The full
support for per-irqdomain affinity limits will take a bit more work but
this part is quite simple.

Since we don't yet have per-irqdomain affinity, we currently attempt to
avoid bringing CPUs online at all if they can't be targeted by external
interrupts. Except we still let them get hotplugged later... which is
moderately suboptimal.

Fix that, and support the hypervisor enlightenment which at least
extends the range of targetable APIC IDs to 15 bits, as seen in the
patch at https://patchwork.kernel.org/patch/11816693/ for qemu.


David Woodhouse (5):
      x86/apic: Fix x2apic enablement without interrupt remapping
      x86/msi: Only use high bits of MSI address for DMAR unit
      x86/ioapic: Handle Extended Destination ID field in RTE
      x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available
      x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID

 Documentation/virt/kvm/cpuid.rst     |  4 ++++
 arch/x86/include/asm/apic.h          |  1 +
 arch/x86/include/asm/io_apic.h       |  3 ++-
 arch/x86/include/asm/mpspec.h        |  1 +
 arch/x86/include/asm/x86_init.h      |  2 ++
 arch/x86/include/uapi/asm/kvm_para.h |  1 +
 arch/x86/kernel/apic/apic.c          | 27 +++++++++++++++++++++------
 arch/x86/kernel/apic/io_apic.c       | 19 +++++++++++++------
 arch/x86/kernel/apic/msi.c           | 41 +++++++++++++++++++++++++++++++++++------
 arch/x86/kernel/apic/x2apic_phys.c   |  9 +++++++++
 arch/x86/kernel/kvm.c                |  6 ++++++
 arch/x86/kernel/x86_init.c           |  1 +
 12 files changed, 96 insertions(+), 19 deletions(-)

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 1/5] x86/apic: Fix x2apic enablement without interrupt remapping
  2020-10-07 12:20 [PATCH 0/5] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
@ 2020-10-07 12:20 ` David Woodhouse
  2020-10-07 12:20   ` [PATCH 2/5] x86/msi: Only use high bits of MSI address for DMAR unit David Woodhouse
                     ` (4 more replies)
  2020-10-09 10:46 ` [PATCH v2 0/8] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
  1 sibling, 5 replies; 58+ messages in thread
From: David Woodhouse @ 2020-10-07 12:20 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel

From: David Woodhouse <dwmw@amazon.co.uk>

Currently, Linux as a hypervisor guest will enable x2apic only if there
are no CPUs present at boot time with an APIC ID above 255.

Hotplugging a CPU later with a higher APIC ID would result in a CPU
which cannot be targeted by external interrupts.

Add a filter in x2apic_apic_id_valid() which can be used to prevent
such CPUs from coming online, and allow x2apic to be enabled even if
they are present at boot time.

Fixes: ce69a784504 ("x86/apic: Enable x2APIC without interrupt remapping under KVM")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/apic.h        |  1 +
 arch/x86/kernel/apic/apic.c        | 14 ++++++++------
 arch/x86/kernel/apic/x2apic_phys.c |  9 +++++++++
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 1c129abb7f09..b0fd204e0023 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -259,6 +259,7 @@ static inline u64 native_x2apic_icr_read(void)
 
 extern int x2apic_mode;
 extern int x2apic_phys;
+extern void __init x2apic_set_max_apicid(u32 apicid);
 extern void __init check_x2apic(void);
 extern void x2apic_setup(void);
 static inline int x2apic_enabled(void)
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index b3eef1d5c903..113f6ca7b828 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1841,20 +1841,22 @@ static __init void try_to_enable_x2apic(int remap_mode)
 		return;
 
 	if (remap_mode != IRQ_REMAP_X2APIC_MODE) {
-		/* IR is required if there is APIC ID > 255 even when running
-		 * under KVM
+		/*
+		 * Using X2APIC without IR is not architecturally supported
+		 * on bare metal but may be supported in guests.
 		 */
-		if (max_physical_apicid > 255 ||
-		    !x86_init.hyper.x2apic_available()) {
+		if (!x86_init.hyper.x2apic_available()) {
 			pr_info("x2apic: IRQ remapping doesn't support X2APIC mode\n");
 			x2apic_disable();
 			return;
 		}
 
 		/*
-		 * without IR all CPUs can be addressed by IOAPIC/MSI
-		 * only in physical mode
+		 * Without IR, all CPUs can be addressed by IOAPIC/MSI only
+		 * in physical mode, and CPUs with an APIC ID that cannnot
+		 * be addressed must not be brought online.
 		 */
+		x2apic_set_max_apicid(255);
 		x2apic_phys = 1;
 	}
 	x2apic_enable();
diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c
index bc9693841353..b4b4e89c1118 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -8,6 +8,12 @@
 int x2apic_phys;
 
 static struct apic apic_x2apic_phys;
+static u32 x2apic_max_apicid;
+
+void __init x2apic_set_max_apicid(u32 apicid)
+{
+	x2apic_max_apicid = apicid;
+}
 
 static int __init set_x2apic_phys_mode(char *arg)
 {
@@ -98,6 +104,9 @@ static int x2apic_phys_probe(void)
 /* Common x2apic functions, also used by x2apic_cluster */
 int x2apic_apic_id_valid(u32 apicid)
 {
+	if (x2apic_max_apicid && apicid > x2apic_max_apicid)
+		return 0;
+
 	return 1;
 }
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 2/5] x86/msi: Only use high bits of MSI address for DMAR unit
  2020-10-07 12:20 ` [PATCH 1/5] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
@ 2020-10-07 12:20   ` David Woodhouse
  2020-10-07 12:20   ` [PATCH 3/5] x86/ioapic: Handle Extended Destination ID field in RTE David Woodhouse
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 58+ messages in thread
From: David Woodhouse @ 2020-10-07 12:20 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel

From: David Woodhouse <dwmw@amazon.co.uk>

The Intel IOMMU has an MSI-like configuration for its interrupt, but
it isn't really MSI. So it gets to abuse the high 32 bits of the address,
and puts the high 24 bits of the extended APIC ID there.

This isn't something that can be used in the general case for real MSIs,
since external devices using the high bits of the address would be
performing writes to actual memory space above 4GiB, not targeted at the
APIC.

Factor the hack out and allow it only to be used when appropriate,
adding a WARN_ON_ONCE() if other MSIs are targeted at an unreachable
APIC ID. In *theory* that should never happen since the compatibility
MSI messages are not supposed to be used with Interrupt Remapping
enabled. In practice, if IR is enabled but some devices aren't within
scope of any given remapping unit, it might happen. But that's a longer
story and this warning is the right thing to do in that case for the
short term.

The x2apic_enabled() check isn't needed because Linux won't bring up
CPUs with higher APIC IDs unless x2apic is enabled anyway.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kernel/apic/msi.c | 33 +++++++++++++++++++++++++++------
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 6313f0a05db7..2825e003259c 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -23,13 +23,11 @@
 
 struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
 
-static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg)
+static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
+				  bool dmar)
 {
 	msg->address_hi = MSI_ADDR_BASE_HI;
 
-	if (x2apic_enabled())
-		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
-
 	msg->address_lo =
 		MSI_ADDR_BASE_LO |
 		((apic->irq_dest_mode == 0) ?
@@ -43,18 +41,40 @@ static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg)
 		MSI_DATA_LEVEL_ASSERT |
 		MSI_DATA_DELIVERY_FIXED |
 		MSI_DATA_VECTOR(cfg->vector);
+
+	/*
+	 * Only the IOMMU itself can use the trick of putting destination
+	 * APIC ID into the high bits of the address. Anything else would
+	 * just be writing to memory if it tried that, and needs IR to
+	 * address higher APIC IDs.
+	 */
+	if (dmar)
+		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
+	else
+		WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid));
 }
 
 void x86_vector_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
 {
-	__irq_msi_compose_msg(irqd_cfg(data), msg);
+	__irq_msi_compose_msg(irqd_cfg(data), msg, false);
+}
+
+/*
+ * The Intel IOMMU (ab)uses the high bits of the MSI address to contain the
+ * high bits of the destination APIC ID. This can't be done in the general
+ * case for MSIs as it would be targeting real memory above 4GiB not the
+ * APIC.
+ */
+static void dmar_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
+{
+	__irq_msi_compose_msg(irqd_cfg(data), msg, true);
 }
 
 static void irq_msi_update_msg(struct irq_data *irqd, struct irq_cfg *cfg)
 {
 	struct msi_msg msg[2] = { [1] = { }, };
 
-	__irq_msi_compose_msg(cfg, msg);
+	__irq_msi_compose_msg(cfg, msg, 0);
 	irq_data_get_irq_chip(irqd)->irq_write_msi_msg(irqd, msg);
 }
 
@@ -288,6 +308,7 @@ static struct irq_chip dmar_msi_controller = {
 	.irq_ack		= irq_chip_ack_parent,
 	.irq_set_affinity	= msi_domain_set_affinity,
 	.irq_retrigger		= irq_chip_retrigger_hierarchy,
+	.irq_compose_msi_msg	= dmar_msi_compose_msg,
 	.irq_write_msi_msg	= dmar_msi_write_msg,
 	.flags			= IRQCHIP_SKIP_SET_WAKE,
 };
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 3/5] x86/ioapic: Handle Extended Destination ID field in RTE
  2020-10-07 12:20 ` [PATCH 1/5] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
  2020-10-07 12:20   ` [PATCH 2/5] x86/msi: Only use high bits of MSI address for DMAR unit David Woodhouse
@ 2020-10-07 12:20   ` David Woodhouse
  2020-10-08  9:12     ` Peter Zijlstra
  2020-10-08 11:41     ` Thomas Gleixner
  2020-10-07 12:20   ` [PATCH 4/5] x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available David Woodhouse
                     ` (2 subsequent siblings)
  4 siblings, 2 replies; 58+ messages in thread
From: David Woodhouse @ 2020-10-07 12:20 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel

From: David Woodhouse <dwmw@amazon.co.uk>

The IOAPIC Redirection Table Entries contain an 8-bit Extended
Destination ID field which maps to bits 11-4 of the MSI address.

The lowest bit is used to indicate remappable format, when interrupt
remapping is in use. A hypervisor can use the other 7 bits to permit
guests to address up to 15 bits of APIC IDs, thus allowing 32768 vCPUs
before having to expose a vIOMMU and interrupt remapping to the guest.

No behavioural change in this patch, since nothing yet permits APIC IDs
above 255 to be used with the non-IR IOAPIC domain. Except for the case
where IR is enabled but there are IOAPICs which aren't in the scope of
any IOMMU, which is totally hosed anyway and needs fixing independently
of this change.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/io_apic.h |  3 ++-
 arch/x86/kernel/apic/io_apic.c | 19 +++++++++++++------
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index a1a26f6d3aa4..e65a0b7379d0 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -78,7 +78,8 @@ struct IO_APIC_route_entry {
 		mask		:  1,	/* 0: enabled, 1: disabled */
 		__reserved_2	: 15;
 
-	__u32	__reserved_3	: 24,
+	__u32	__reserved_3	: 17,
+		ext_dest	:  7,
 		dest		:  8;
 } __attribute__ ((packed));
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index a33380059db6..aa9a3b54a96c 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1239,10 +1239,10 @@ static void io_apic_print_entries(unsigned int apic, unsigned int nr_entries)
 			       buf, (ir_entry->index2 << 15) | ir_entry->index,
 			       ir_entry->zero);
 		else
-			printk(KERN_DEBUG "%s, %s, D(%02X), M(%1d)\n",
+			printk(KERN_DEBUG "%s, %s, D(%02X%02X), M(%1d)\n",
 			       buf,
 			       entry.dest_mode == IOAPIC_DEST_MODE_LOGICAL ?
-			       "logical " : "physical",
+			       "logical " : "physical", entry.ext_dest,
 			       entry.dest, entry.delivery_mode);
 	}
 }
@@ -1410,6 +1410,7 @@ void native_restore_boot_irq_mode(void)
 	 */
 	if (ioapic_i8259.pin != -1) {
 		struct IO_APIC_route_entry entry;
+		u32 apic_id = read_apic_id();
 
 		memset(&entry, 0, sizeof(entry));
 		entry.mask		= IOAPIC_UNMASKED;
@@ -1417,7 +1418,8 @@ void native_restore_boot_irq_mode(void)
 		entry.polarity		= IOAPIC_POL_HIGH;
 		entry.dest_mode		= IOAPIC_DEST_MODE_PHYSICAL;
 		entry.delivery_mode	= dest_ExtINT;
-		entry.dest		= read_apic_id();
+		entry.dest		= apic_id & 0xff;
+		entry.ext_dest		= apic_id >> 8;
 
 		/*
 		 * Add it to the IO-APIC irq-routing table:
@@ -1861,7 +1863,8 @@ static void ioapic_configure_entry(struct irq_data *irqd)
 	 * ioapic chip to verify that.
 	 */
 	if (irqd->chip == &ioapic_chip) {
-		mpd->entry.dest = cfg->dest_apicid;
+		mpd->entry.dest = cfg->dest_apicid & 0xff;
+		mpd->entry.ext_dest = cfg->dest_apicid >> 8;
 		mpd->entry.vector = cfg->vector;
 	}
 	for_each_irq_pin(entry, mpd->irq_2_pin)
@@ -2027,6 +2030,7 @@ static inline void __init unlock_ExtINT_logic(void)
 	int apic, pin, i;
 	struct IO_APIC_route_entry entry0, entry1;
 	unsigned char save_control, save_freq_select;
+	u32 apic_id;
 
 	pin  = find_isa_irq_pin(8, mp_INT);
 	if (pin == -1) {
@@ -2042,11 +2046,13 @@ static inline void __init unlock_ExtINT_logic(void)
 	entry0 = ioapic_read_entry(apic, pin);
 	clear_IO_APIC_pin(apic, pin);
 
+	apic_id = hard_smp_processor_id();
 	memset(&entry1, 0, sizeof(entry1));
 
 	entry1.dest_mode = IOAPIC_DEST_MODE_PHYSICAL;
 	entry1.mask = IOAPIC_UNMASKED;
-	entry1.dest = hard_smp_processor_id();
+	entry1.dest = apic_id & 0xff;
+	entry1.ext_dest = apic_id >> 8;
 	entry1.delivery_mode = dest_ExtINT;
 	entry1.polarity = entry0.polarity;
 	entry1.trigger = IOAPIC_EDGE;
@@ -2949,7 +2955,8 @@ static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
 	memset(entry, 0, sizeof(*entry));
 	entry->delivery_mode = apic->irq_delivery_mode;
 	entry->dest_mode     = apic->irq_dest_mode;
-	entry->dest	     = cfg->dest_apicid;
+	entry->dest	     = cfg->dest_apicid & 0xff;
+	entry->ext_dest	     = cfg->dest_apicid >> 8;
 	entry->vector	     = cfg->vector;
 	entry->trigger	     = data->trigger;
 	entry->polarity	     = data->polarity;
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 4/5] x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available
  2020-10-07 12:20 ` [PATCH 1/5] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
  2020-10-07 12:20   ` [PATCH 2/5] x86/msi: Only use high bits of MSI address for DMAR unit David Woodhouse
  2020-10-07 12:20   ` [PATCH 3/5] x86/ioapic: Handle Extended Destination ID field in RTE David Woodhouse
@ 2020-10-07 12:20   ` David Woodhouse
  2020-10-08 11:54     ` Thomas Gleixner
  2020-10-07 12:20   ` [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID David Woodhouse
  2020-10-08 11:46   ` [PATCH 1/5] x86/apic: Fix x2apic enablement without interrupt remapping Thomas Gleixner
  4 siblings, 1 reply; 58+ messages in thread
From: David Woodhouse @ 2020-10-07 12:20 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel

From: David Woodhouse <dwmw@amazon.co.uk>

Some hypervisors can allow the guest to use the Extended Destination ID
field in the IOAPIC RTE and MSI address to address up to 32768 CPUs.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/mpspec.h   |  1 +
 arch/x86/include/asm/x86_init.h |  2 ++
 arch/x86/kernel/apic/apic.c     | 15 ++++++++++++++-
 arch/x86/kernel/apic/msi.c      | 10 +++++++++-
 arch/x86/kernel/x86_init.c      |  1 +
 5 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index e90ac7e9ae2c..25ee8ca0a1f2 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -42,6 +42,7 @@ extern DECLARE_BITMAP(mp_bus_not_pci, MAX_MP_BUSSES);
 extern unsigned int boot_cpu_physical_apicid;
 extern u8 boot_cpu_apic_version;
 extern unsigned long mp_lapic_addr;
+extern int msi_ext_dest_id;
 
 #ifdef CONFIG_X86_LOCAL_APIC
 extern int smp_found_config;
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 397196fae24d..5af3fe9e38f3 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -114,6 +114,7 @@ struct x86_init_pci {
  * @init_platform:		platform setup
  * @guest_late_init:		guest late init
  * @x2apic_available:		X2APIC detection
+ * @msi_ext_dest_id:		MSI and IOAPIC support 15-bit APIC IDs
  * @init_mem_mapping:		setup early mappings during init_mem_mapping()
  * @init_after_bootmem:		guest init after boot allocator is finished
  */
@@ -121,6 +122,7 @@ struct x86_hyper_init {
 	void (*init_platform)(void);
 	void (*guest_late_init)(void);
 	bool (*x2apic_available)(void);
+	bool (*msi_ext_dest_id)(void);
 	void (*init_mem_mapping)(void);
 	void (*init_after_bootmem)(void);
 };
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 113f6ca7b828..ba24a343c1f2 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1837,9 +1837,21 @@ static __init void x2apic_enable(void)
 
 static __init void try_to_enable_x2apic(int remap_mode)
 {
+	u32 apic_limit = 255;
+
 	if (x2apic_state == X2APIC_DISABLED)
 		return;
 
+	/*
+	 * If the hypervisor supports extended destination ID in IOAPIC
+	 * and MSI, that increases the maximum APIC ID that can be used
+	 * for non-remapped IRQ domains.
+	 */
+	if (x86_init.hyper.msi_ext_dest_id()) {
+		msi_ext_dest_id = 1;
+		apic_limit = 32767;
+	}
+
 	if (remap_mode != IRQ_REMAP_X2APIC_MODE) {
 		/*
 		 * Using X2APIC without IR is not architecturally supported
@@ -1856,9 +1868,10 @@ static __init void try_to_enable_x2apic(int remap_mode)
 		 * in physical mode, and CPUs with an APIC ID that cannnot
 		 * be addressed must not be brought online.
 		 */
-		x2apic_set_max_apicid(255);
+		x2apic_set_max_apicid(apic_limit);
 		x2apic_phys = 1;
 	}
+
 	x2apic_enable();
 }
 
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 2825e003259c..85206f971284 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -23,8 +23,11 @@
 
 struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
 
+int msi_ext_dest_id __ro_after_init;
+
 static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
 				  bool dmar)
+
 {
 	msg->address_hi = MSI_ADDR_BASE_HI;
 
@@ -46,10 +49,15 @@ static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
 	 * Only the IOMMU itself can use the trick of putting destination
 	 * APIC ID into the high bits of the address. Anything else would
 	 * just be writing to memory if it tried that, and needs IR to
-	 * address higher APIC IDs.
+	 * address APICs which can't be addressed in the normal 32-bit
+	 * address range at 0xFFExxxxx. That is typically just 8 bits, but
+	 * some hypervisors allow the extended destination ID field in bits
+	 * 11-5 to be used, giving support for 15 bits of APIC IDs in total.
 	 */
 	if (dmar)
 		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
+	else if (msi_ext_dest_id && cfg->dest_apicid < 0x8000)
+		msg->address_lo |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid) >> 3;
 	else
 		WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid));
 }
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index a3038d8deb6a..8b395821cb8d 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -110,6 +110,7 @@ struct x86_init_ops x86_init __initdata = {
 		.init_platform		= x86_init_noop,
 		.guest_late_init	= x86_init_noop,
 		.x2apic_available	= bool_x86_init_noop,
+		.msi_ext_dest_id	= bool_x86_init_noop,
 		.init_mem_mapping	= x86_init_noop,
 		.init_after_bootmem	= x86_init_noop,
 	},
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-07 12:20 ` [PATCH 1/5] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
                     ` (2 preceding siblings ...)
  2020-10-07 12:20   ` [PATCH 4/5] x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available David Woodhouse
@ 2020-10-07 12:20   ` David Woodhouse
  2020-10-08 12:05     ` Thomas Gleixner
  2020-10-08 11:46   ` [PATCH 1/5] x86/apic: Fix x2apic enablement without interrupt remapping Thomas Gleixner
  4 siblings, 1 reply; 58+ messages in thread
From: David Woodhouse @ 2020-10-07 12:20 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel

From: David Woodhouse <dwmw@amazon.co.uk>

This allows the host to indicate that IOAPIC and MSI emulation supports
15-bit destination IDs, allowing up to 32768 CPUs without interrupt
remapping.

cf. https://patchwork.kernel.org/patch/11816693/ for qemu

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/virt/kvm/cpuid.rst     | 4 ++++
 arch/x86/include/uapi/asm/kvm_para.h | 1 +
 arch/x86/kernel/kvm.c                | 6 ++++++
 3 files changed, 11 insertions(+)

diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index a7dff9186bed..1726b5925d2b 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -92,6 +92,10 @@ KVM_FEATURE_ASYNC_PF_INT          14          guest checks this feature bit
                                               async pf acknowledgment msr
                                               0x4b564d07.
 
+KVM_FEATURE_MSI_EXT_DEST_ID       15          guest checks this feature bit
+                                              before using extended destination
+                                              ID bits in MSI address bits 11-5.
+
 KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24          host will warn if no guest-side
                                               per-cpu warps are expeced in
                                               kvmclock
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 812e9b4c1114..950afebfba88 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -32,6 +32,7 @@
 #define KVM_FEATURE_POLL_CONTROL	12
 #define KVM_FEATURE_PV_SCHED_YIELD	13
 #define KVM_FEATURE_ASYNC_PF_INT	14
+#define KVM_FEATURE_MSI_EXT_DEST_ID	15
 
 #define KVM_HINTS_REALTIME      0
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 1b51b727b140..4986b4399aef 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -743,12 +743,18 @@ static void __init kvm_init_platform(void)
 	x86_platform.apic_post_init = kvm_apic_init;
 }
 
+static bool __init kvm_msi_ext_dest_id(void)
+{
+	return kvm_para_has_feature(KVM_FEATURE_MSI_EXT_DEST_ID);
+}
+
 const __initconst struct hypervisor_x86 x86_hyper_kvm = {
 	.name			= "KVM",
 	.detect			= kvm_detect,
 	.type			= X86_HYPER_KVM,
 	.init.guest_late_init	= kvm_guest_init,
 	.init.x2apic_available	= kvm_para_available,
+	.init.msi_ext_dest_id	= kvm_msi_ext_dest_id,
 	.init.init_platform	= kvm_init_platform,
 };
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 3/5] x86/ioapic: Handle Extended Destination ID field in RTE
  2020-10-07 12:20   ` [PATCH 3/5] x86/ioapic: Handle Extended Destination ID field in RTE David Woodhouse
@ 2020-10-08  9:12     ` Peter Zijlstra
  2020-10-08 17:05       ` David Woodhouse
  2020-10-08 11:41     ` Thomas Gleixner
  1 sibling, 1 reply; 58+ messages in thread
From: Peter Zijlstra @ 2020-10-08  9:12 UTC (permalink / raw)
  To: David Woodhouse; +Cc: x86, kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel

On Wed, Oct 07, 2020 at 01:20:44PM +0100, David Woodhouse wrote:
> @@ -1861,7 +1863,8 @@ static void ioapic_configure_entry(struct irq_data *irqd)
>  	 * ioapic chip to verify that.
>  	 */
>  	if (irqd->chip == &ioapic_chip) {
> -		mpd->entry.dest = cfg->dest_apicid;
> +		mpd->entry.dest = cfg->dest_apicid & 0xff;
> +		mpd->entry.ext_dest = cfg->dest_apicid >> 8;
>  		mpd->entry.vector = cfg->vector;
>  	}
>  	for_each_irq_pin(entry, mpd->irq_2_pin)

All the other sites did memset(0) before the assignment, and this the
extra unconditional write of 0 to ext_dest is harmless.

This might be true for this site too, but it wasn't immediately obvious.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 3/5] x86/ioapic: Handle Extended Destination ID field in RTE
  2020-10-07 12:20   ` [PATCH 3/5] x86/ioapic: Handle Extended Destination ID field in RTE David Woodhouse
  2020-10-08  9:12     ` Peter Zijlstra
@ 2020-10-08 11:41     ` Thomas Gleixner
  1 sibling, 0 replies; 58+ messages in thread
From: Thomas Gleixner @ 2020-10-08 11:41 UTC (permalink / raw)
  To: David Woodhouse, x86; +Cc: kvm, Paolo Bonzini, linux-kernel

On Wed, Oct 07 2020 at 13:20, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> The IOAPIC Redirection Table Entries contain an 8-bit Extended
> Destination ID field which maps to bits 11-4 of the MSI address.
>
> The lowest bit is used to indicate remappable format, when interrupt
> remapping is in use. A hypervisor can use the other 7 bits to permit
> guests to address up to 15 bits of APIC IDs, thus allowing 32768 vCPUs
> before having to expose a vIOMMU and interrupt remapping to the guest.
>
> No behavioural change in this patch, since nothing yet permits APIC IDs
> above 255 to be used with the non-IR IOAPIC domain. Except for the case
> where IR is enabled but there are IOAPICs which aren't in the scope of
> any IOMMU, which is totally hosed anyway and needs fixing independently
> of this change.

Again: IOAPICs which are not covered by IR are detected and prevent IR
enablement which makes the above a fairy tale. Changelogs are about
facts.

> diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
> index a1a26f6d3aa4..e65a0b7379d0 100644
> --- a/arch/x86/include/asm/io_apic.h
> +++ b/arch/x86/include/asm/io_apic.h
> @@ -78,7 +78,8 @@ struct IO_APIC_route_entry {
>  		mask		:  1,	/* 0: enabled, 1: disabled */
>  		__reserved_2	: 15;
>  
> -	__u32	__reserved_3	: 24,
> +	__u32	__reserved_3	: 17,
> +		ext_dest	:  7,

This wants to be explicitely named 'virt_ext_dest'.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/5] x86/apic: Fix x2apic enablement without interrupt remapping
  2020-10-07 12:20 ` [PATCH 1/5] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
                     ` (3 preceding siblings ...)
  2020-10-07 12:20   ` [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID David Woodhouse
@ 2020-10-08 11:46   ` Thomas Gleixner
  4 siblings, 0 replies; 58+ messages in thread
From: Thomas Gleixner @ 2020-10-08 11:46 UTC (permalink / raw)
  To: David Woodhouse, x86; +Cc: kvm, Paolo Bonzini, linux-kernel

On Wed, Oct 07 2020 at 13:20, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>  
>  static struct apic apic_x2apic_phys;
> +static u32 x2apic_max_apicid;

__ro_after_init?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 4/5] x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available
  2020-10-07 12:20   ` [PATCH 4/5] x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available David Woodhouse
@ 2020-10-08 11:54     ` Thomas Gleixner
  2020-10-08 12:02       ` Thomas Gleixner
  2020-10-08 13:00       ` David Woodhouse
  0 siblings, 2 replies; 58+ messages in thread
From: Thomas Gleixner @ 2020-10-08 11:54 UTC (permalink / raw)
  To: David Woodhouse, x86; +Cc: kvm, Paolo Bonzini, linux-kernel

On Wed, Oct 07 2020 at 13:20, David Woodhouse wrote:
>  
> +	/*
> +	 * If the hypervisor supports extended destination ID in IOAPIC
> +	 * and MSI, that increases the maximum APIC ID that can be used
> +	 * for non-remapped IRQ domains.
> +	 */
> +	if (x86_init.hyper.msi_ext_dest_id()) {
> +		msi_ext_dest_id = 1;
> +		apic_limit = 32767;
> +	}

This needs to be outside of the remap mode check because?

> +
>  	if (remap_mode != IRQ_REMAP_X2APIC_MODE) {
>  		/*
>  		 * Using X2APIC without IR is not architecturally supported
> @@ -1856,9 +1868,10 @@ static __init void try_to_enable_x2apic(int remap_mode)
>  		 * in physical mode, and CPUs with an APIC ID that cannnot
>  		 * be addressed must not be brought online.
>  		 */
> -		x2apic_set_max_apicid(255);
> +		x2apic_set_max_apicid(apic_limit);
>  		x2apic_phys = 1;
>  	}
> +
>  	x2apic_enable();
>  }
>  
> diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
> index 2825e003259c..85206f971284 100644
> --- a/arch/x86/kernel/apic/msi.c
> +++ b/arch/x86/kernel/apic/msi.c
> @@ -23,8 +23,11 @@
>  
>  struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
>  
> +int msi_ext_dest_id __ro_after_init;

bool please.

Aside of that this breaks the build for a kernel with CONFIG_PCI_MSI=n

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 4/5] x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available
  2020-10-08 11:54     ` Thomas Gleixner
@ 2020-10-08 12:02       ` Thomas Gleixner
  2020-10-08 13:00       ` David Woodhouse
  1 sibling, 0 replies; 58+ messages in thread
From: Thomas Gleixner @ 2020-10-08 12:02 UTC (permalink / raw)
  To: David Woodhouse, x86; +Cc: kvm, Paolo Bonzini, linux-kernel

On Thu, Oct 08 2020 at 13:54, Thomas Gleixner wrote:
> On Wed, Oct 07 2020 at 13:20, David Woodhouse wrote:
>> diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
>> index 2825e003259c..85206f971284 100644
>> --- a/arch/x86/kernel/apic/msi.c
>> +++ b/arch/x86/kernel/apic/msi.c
>> @@ -23,8 +23,11 @@
>>  
>>  struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
>>  
>> +int msi_ext_dest_id __ro_after_init;
>
> bool please.
>
> Aside of that this breaks the build for a kernel with CONFIG_PCI_MSI=n

So this wants to be

bool virt_ext_dest_id __ro_after_init;

in apic.c and then please make the IO/APIC places depend on this as well
so any change to the utilization of the reserved IO/APIC bits in the
future is not going to end up in surprises.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-07 12:20   ` [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID David Woodhouse
@ 2020-10-08 12:05     ` Thomas Gleixner
  2020-10-08 12:55       ` David Woodhouse
  0 siblings, 1 reply; 58+ messages in thread
From: Thomas Gleixner @ 2020-10-08 12:05 UTC (permalink / raw)
  To: David Woodhouse, x86; +Cc: kvm, Paolo Bonzini, linux-kernel

On Wed, Oct 07 2020 at 13:20, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> This allows the host to indicate that IOAPIC and MSI emulation supports
> 15-bit destination IDs, allowing up to 32768 CPUs without interrupt
> remapping.
>
> cf. https://patchwork.kernel.org/patch/11816693/ for qemu
>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  Documentation/virt/kvm/cpuid.rst     | 4 ++++
>  arch/x86/include/uapi/asm/kvm_para.h | 1 +
>  arch/x86/kernel/kvm.c                | 6 ++++++
>  3 files changed, 11 insertions(+)
>
> diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
> index a7dff9186bed..1726b5925d2b 100644
> --- a/Documentation/virt/kvm/cpuid.rst
> +++ b/Documentation/virt/kvm/cpuid.rst
> @@ -92,6 +92,10 @@ KVM_FEATURE_ASYNC_PF_INT          14          guest checks this feature bit
>                                                async pf acknowledgment msr
>                                                0x4b564d07.
>  
> +KVM_FEATURE_MSI_EXT_DEST_ID       15          guest checks this feature bit
> +                                              before using extended destination
> +                                              ID bits in MSI address
> bits 11-5.

Why MSI_EXT_DEST_ID? It's enabling that for MSI and IO/APIC. The
underlying mechanism might be the same, but APIC_EXT_DEST_ID is more
general and then you might also make the explanation of that bit match
the changelog.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-08 12:05     ` Thomas Gleixner
@ 2020-10-08 12:55       ` David Woodhouse
  2020-10-08 16:08         ` David Woodhouse
  0 siblings, 1 reply; 58+ messages in thread
From: David Woodhouse @ 2020-10-08 12:55 UTC (permalink / raw)
  To: Thomas Gleixner, x86; +Cc: kvm, Paolo Bonzini, linux-kernel


[-- Attachment #1: Type: text/plain, Size: 1819 bytes --]

On Thu, 2020-10-08 at 14:05 +0200, Thomas Gleixner wrote:
> Why MSI_EXT_DEST_ID? It's enabling that for MSI and IO/APIC. The
> underlying mechanism might be the same, but APIC_EXT_DEST_ID is more
> general and then you might also make the explanation of that bit
> match the changelog.

It's enabling it for *everything* that generates MSI cycles — which
includes IOAPIC, HPET, and MSI-capable PCI devices. Hell, and anything
else which feels like generating a physical address cycle to 0xFEExxxxx
addresses.

Again, the IOAPIC is just a device for turning pin-based interrupts
into MSIs. Bits 19-4 of the address it writes to are taken directly
from bits 63-48 of the IOAPIC RTE. There's complexity elsewhere but for
*those* bits, It just uses the bits it's given, just like a PCI device
or an HPET would. 

When I implemented this in qemu I didn't even *touch* the IOAPIC
support; it doesn't affect the IOAPIC at all, just like it doesn't
affect the HPET or any of the MSI-capable PCI devices that qemu
emulates. They just put the bits on the bus that they're told to, when
they want to generate an interrupt.

This feature is an MSI feature.

Not an HPET feature.

Not a PCI feature.

Not an IOAPIC feature.

The fact that a *Linux* guest has special-case knowledge in its IOAPIC
driver that duplicates what the MSI message composition does, is not a
good justification for naming the feature bit bizarrely.

In fact I'm really tempted to make Linux's io_apic.c just use
irq_chip_compose_msi_msg() and swizzle the bits out of the message
identically for IR and non-IR alike (modulo the pin hack), and delete
the IR_IO_APIC_route_entry struct completely. 

That also completely removes the ext_dest_id trick from visibility in
io_apic.c. And might avoid further confusion.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 4/5] x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available
  2020-10-08 11:54     ` Thomas Gleixner
  2020-10-08 12:02       ` Thomas Gleixner
@ 2020-10-08 13:00       ` David Woodhouse
  1 sibling, 0 replies; 58+ messages in thread
From: David Woodhouse @ 2020-10-08 13:00 UTC (permalink / raw)
  To: Thomas Gleixner, x86; +Cc: kvm, Paolo Bonzini, linux-kernel


[-- Attachment #1: Type: text/plain, Size: 1713 bytes --]

On Thu, 2020-10-08 at 13:54 +0200, Thomas Gleixner wrote:
> On Wed, Oct 07 2020 at 13:20, David Woodhouse wrote:
> >  
> > +	/*
> > +	 * If the hypervisor supports extended destination ID in IOAPIC
> > +	 * and MSI, that increases the maximum APIC ID that can be used
> > +	 * for non-remapped IRQ domains.
> > +	 */
> > +	if (x86_init.hyper.msi_ext_dest_id()) {
> > +		msi_ext_dest_id = 1;
> > +		apic_limit = 32767;
> > +	}
> 
> This needs to be outside of the remap mode check because?


Once upon a time, there was a later patch in the series which *also*
used the apic_limit variable to generate a maximum affinity mask.

Now we've ditched that idea, I can put this back inside the remap mode
check.

> 
> > +
> >  	if (remap_mode != IRQ_REMAP_X2APIC_MODE) {
> >  		/*
> >  		 * Using X2APIC without IR is not architecturally supported
> > @@ -1856,9 +1868,10 @@ static __init void try_to_enable_x2apic(int remap_mode)
> >  		 * in physical mode, and CPUs with an APIC ID that cannnot
> >  		 * be addressed must not be brought online.
> >  		 */
> > -		x2apic_set_max_apicid(255);
> > +		x2apic_set_max_apicid(apic_limit);
> >  		x2apic_phys = 1;
> >  	}
> > +
> >  	x2apic_enable();
> >  }
> >  
> > diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
> > index 2825e003259c..85206f971284 100644
> > --- a/arch/x86/kernel/apic/msi.c
> > +++ b/arch/x86/kernel/apic/msi.c
> > @@ -23,8 +23,11 @@
> >  
> >  struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
> >  
> > +int msi_ext_dest_id __ro_after_init;
> 
> bool please.
> 
> Aside of that this breaks the build for a kernel with CONFIG_PCI_MSI=n

Will fix (and rename).


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-08 12:55       ` David Woodhouse
@ 2020-10-08 16:08         ` David Woodhouse
  2020-10-08 21:14           ` Thomas Gleixner
  0 siblings, 1 reply; 58+ messages in thread
From: David Woodhouse @ 2020-10-08 16:08 UTC (permalink / raw)
  To: Thomas Gleixner, x86; +Cc: kvm, Paolo Bonzini, linux-kernel


[-- Attachment #1: Type: text/plain, Size: 13159 bytes --]

On Thu, 2020-10-08 at 13:55 +0100, David Woodhouse wrote:
> 
> In fact I'm really tempted to make Linux's io_apic.c just use
> irq_chip_compose_msi_msg() and swizzle the bits out of the message
> identically for IR and non-IR alike (modulo the pin hack), and delete
> the IR_IO_APIC_route_entry struct completely. 
> 
> That also completely removes the ext_dest_id trick from visibility in
> io_apic.c. And might avoid further confusion.

That looks a bit like this, FWIW. Turns out it doesn't *entirely*
remove the ext_dest_id trick from io_apic.c because it does still need
to put together an entry manually for the ExtINT hackery and for
restoring boot mode. But it does remove a lot of incestuous ioapic
hackery from IOMMU drivers, and deletes more code than it adds.

It has the additional effect of enabling the WARN_ON for unreachable
APIC IDs in __irq_msi_compose_msg(), for IOAPIC interrupts too.

(We'd want the x86_vector_domain to actually have an MSI compose
function in the !CONFIG_PCI_MSI case if we did this, of course.)

From 2fbc79588d4677ee1cc9df661162fcf1a7da57f0 Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw@amazon.co.uk>
Date: Thu, 8 Oct 2020 15:44:42 +0100
Subject: [PATCH 6/5] x86/ioapic: Generate RTE directly from parent irqchip's MSI
 message

The IOAPIC generates an MSI cycle with address/data bits taken from its
Redirection Table Entry in some combination which used to make sense,
but now is just a bunch of bits which get passed through in some
seemingly arbitrary order.

Instead of making IRQ remapping drivers directly frob the IOAPIC RTE,
let them just do their job and generate an MSI message. The bit
swizzling to turn that MSI message into the IOAPIC's RTE is the same in
all cases, since it's a function of the IOAPIC hardware. The IRQ
remappers have no real need to get involved with that.

The only slight caveat is that the IOAPIC is interpreting some of
those fields too, and it does want the 'vector' field to be unique
to make EOI work. The AMD IOMMU happens to put its IRTE index in the
bits that the IOAPIC thinks are the vector field, and accommodates
this requirement by reserving the first 32 indices for the IOAPIC.
The Intel IOMMU doesn't actually use the bits that the IOAPIC thinks
are the vector field, so it fills in the 'pin' value there instead.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/hw_irq.h       | 11 +++---
 arch/x86/include/asm/msidef.h       |  2 ++
 arch/x86/kernel/apic/io_apic.c      | 55 ++++++++++++++++++-----------
 drivers/iommu/amd/iommu.c           | 14 --------
 drivers/iommu/hyperv-iommu.c        | 31 ----------------
 drivers/iommu/intel/irq_remapping.c | 19 +++-------
 6 files changed, 46 insertions(+), 86 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index a4aeeaace040..aabd8f1b6bb0 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -45,12 +45,11 @@ enum irq_alloc_type {
 };
 
 struct ioapic_alloc_info {
-	int				pin;
-	int				node;
-	u32				trigger : 1;
-	u32				polarity : 1;
-	u32				valid : 1;
-	struct IO_APIC_route_entry	*entry;
+	int		pin;
+	int		node;
+	u32		trigger : 1;
+	u32		polarity : 1;
+	u32		valid : 1;
 };
 
 struct uv_alloc_info {
diff --git a/arch/x86/include/asm/msidef.h b/arch/x86/include/asm/msidef.h
index ee2f8ccc32d0..37c3d2d492c9 100644
--- a/arch/x86/include/asm/msidef.h
+++ b/arch/x86/include/asm/msidef.h
@@ -18,6 +18,7 @@
 #define MSI_DATA_DELIVERY_MODE_SHIFT	8
 #define  MSI_DATA_DELIVERY_FIXED	(0 << MSI_DATA_DELIVERY_MODE_SHIFT)
 #define  MSI_DATA_DELIVERY_LOWPRI	(1 << MSI_DATA_DELIVERY_MODE_SHIFT)
+#define  MSI_DATA_DELIVERY_MODE_MASK	(3 << MSI_DATA_DELIVERY_MODE_SHIFT)
 
 #define MSI_DATA_LEVEL_SHIFT		14
 #define	 MSI_DATA_LEVEL_DEASSERT	(0 << MSI_DATA_LEVEL_SHIFT)
@@ -37,6 +38,7 @@
 #define MSI_ADDR_DEST_MODE_SHIFT	2
 #define  MSI_ADDR_DEST_MODE_PHYSICAL	(0 << MSI_ADDR_DEST_MODE_SHIFT)
 #define	 MSI_ADDR_DEST_MODE_LOGICAL	(1 << MSI_ADDR_DEST_MODE_SHIFT)
+#define  MSI_ADDR_DEST_MODE_MASK	(1 << MSI_DATA_DELIVERY_MODE_SHIFT)
 
 #define MSI_ADDR_REDIRECTION_SHIFT	3
 #define  MSI_ADDR_REDIRECTION_CPU	(0 << MSI_ADDR_REDIRECTION_SHIFT)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 54f6a029b1d1..ca2da19d5c55 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -48,6 +48,7 @@
 #include <linux/jiffies.h>	/* time_after() */
 #include <linux/slab.h>
 #include <linux/memblock.h>
+#include <linux/msi.h>
 
 #include <asm/irqdomain.h>
 #include <asm/io.h>
@@ -63,6 +64,7 @@
 #include <asm/setup.h>
 #include <asm/irq_remapping.h>
 #include <asm/hw_irq.h>
+#include <asm/msidef.h>
 
 #include <asm/apic.h>
 
@@ -1851,22 +1853,36 @@ static void ioapic_ir_ack_level(struct irq_data *irq_data)
 	eoi_ioapic_pin(data->entry.vector, data);
 }
 
+static void mp_swizzle_msi_dest_bits(struct irq_data *irq_data, void *_entry)
+{
+	struct msi_msg msg;
+	u32 *entry = _entry;
+
+	irq_chip_compose_msi_msg(irq_data, &msg);
+
+	/*
+	 * They're in a bit of a random order for historical reasons, but
+	 * the IO/APIC is just a device for turning interrupt lines into
+	 * MSIs, and various bits of the MSI addr/data are just swizzled
+	 * into/from the bits of Redirection Table Entry.
+	 */
+	entry[0] &= 0xfffff000;
+	entry[0] |= (msg.data & (MSI_DATA_DELIVERY_MODE_MASK |
+				 MSI_DATA_VECTOR_MASK));
+	entry[0] |= (msg.address_lo & MSI_ADDR_DEST_MODE_MASK) << 9;
+
+	entry[1] &= 0xffff;
+	entry[1] |= (msg.address_lo & MSI_ADDR_DEST_ID_MASK) << 12;
+}
+
+
 static void ioapic_configure_entry(struct irq_data *irqd)
 {
 	struct mp_chip_data *mpd = irqd->chip_data;
-	struct irq_cfg *cfg = irqd_cfg(irqd);
 	struct irq_pin_list *entry;
 
-	/*
-	 * Only update when the parent is the vector domain, don't touch it
-	 * if the parent is the remapping domain. Check the installed
-	 * ioapic chip to verify that.
-	 */
-	if (irqd->chip == &ioapic_chip) {
-		mpd->entry.dest = cfg->dest_apicid & 0xff;
-		mpd->entry.virt_ext_dest = cfg->dest_apicid >> 8;
-		mpd->entry.vector = cfg->vector;
-	}
+	mp_swizzle_msi_dest_bits(irqd, &mpd->entry);
+
 	for_each_irq_pin(entry, mpd->irq_2_pin)
 		__ioapic_write_entry(entry->apic, entry->pin, mpd->entry);
 }
@@ -2949,15 +2965,14 @@ static void mp_irqdomain_get_attr(u32 gsi, struct mp_chip_data *data,
 	}
 }
 
-static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
-			   struct IO_APIC_route_entry *entry)
+static void mp_setup_entry(struct irq_data *irq_data, struct mp_chip_data *data)
 {
+	struct IO_APIC_route_entry *entry = &data->entry;
+
 	memset(entry, 0, sizeof(*entry));
-	entry->delivery_mode = apic->irq_delivery_mode;
-	entry->dest_mode     = apic->irq_dest_mode;
-	entry->dest	     = cfg->dest_apicid & 0xff;
-	entry->virt_ext_dest = cfg->dest_apicid >> 8;
-	entry->vector	     = cfg->vector;
+
+	mp_swizzle_msi_dest_bits(irq_data, entry);
+
 	entry->trigger	     = data->trigger;
 	entry->polarity	     = data->polarity;
 	/*
@@ -2995,7 +3010,6 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
 	if (!data)
 		return -ENOMEM;
 
-	info->ioapic.entry = &data->entry;
 	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, info);
 	if (ret < 0) {
 		kfree(data);
@@ -3013,8 +3027,7 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
 	add_pin_to_irq_node(data, ioapic_alloc_attr_node(info), ioapic, pin);
 
 	local_irq_save(flags);
-	if (info->ioapic.entry)
-		mp_setup_entry(cfg, data, info->ioapic.entry);
+	mp_setup_entry(irq_data, data);
 	mp_register_handler(virq, data->trigger);
 	if (virq < nr_legacy_irqs())
 		legacy_pic->mask(virq);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index ef64e01f66d7..13d0a8f42d56 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3597,7 +3597,6 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 {
 	struct irq_2_irte *irte_info = &data->irq_2_irte;
 	struct msi_msg *msg = &data->msi_entry;
-	struct IO_APIC_route_entry *entry;
 	struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
 
 	if (!iommu)
@@ -3611,19 +3610,6 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC:
-		/* Setup IOAPIC entry */
-		entry = info->ioapic.entry;
-		info->ioapic.entry = NULL;
-		memset(entry, 0, sizeof(*entry));
-		entry->vector        = index;
-		entry->mask          = 0;
-		entry->trigger       = info->ioapic.trigger;
-		entry->polarity      = info->ioapic.polarity;
-		/* Mask level triggered irqs. */
-		if (info->ioapic.trigger)
-			entry->mask = 1;
-		break;
-
 	case X86_IRQ_ALLOC_TYPE_HPET:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e09e2d734c57..37dd485a5640 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -40,7 +40,6 @@ static int hyperv_ir_set_affinity(struct irq_data *data,
 {
 	struct irq_data *parent = data->parent_data;
 	struct irq_cfg *cfg = irqd_cfg(data);
-	struct IO_APIC_route_entry *entry;
 	int ret;
 
 	/* Return error If new irq affinity is out of ioapic_max_cpumask. */
@@ -51,9 +50,6 @@ static int hyperv_ir_set_affinity(struct irq_data *data,
 	if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
 		return ret;
 
-	entry = data->chip_data;
-	entry->dest = cfg->dest_apicid;
-	entry->vector = cfg->vector;
 	send_cleanup_vector(cfg);
 
 	return 0;
@@ -89,20 +85,6 @@ static int hyperv_irq_remapping_alloc(struct irq_domain *domain,
 
 	irq_data->chip = &hyperv_ir_chip;
 
-	/*
-	 * If there is interrupt remapping function of IOMMU, setting irq
-	 * affinity only needs to change IRTE of IOMMU. But Hyper-V doesn't
-	 * support interrupt remapping function, setting irq affinity of IO-APIC
-	 * interrupts still needs to change IO-APIC registers. But ioapic_
-	 * configure_entry() will ignore value of cfg->vector and cfg->
-	 * dest_apicid when IO-APIC's parent irq domain is not the vector
-	 * domain.(See ioapic_configure_entry()) In order to setting vector
-	 * and dest_apicid to IO-APIC register, IO-APIC entry pointer is saved
-	 * in the chip_data and hyperv_irq_remapping_activate()/hyperv_ir_set_
-	 * affinity() set vector and dest_apicid directly into IO-APIC entry.
-	 */
-	irq_data->chip_data = info->ioapic.entry;
-
 	/*
 	 * Hypver-V IO APIC irq affinity should be in the scope of
 	 * ioapic_max_cpumask because no irq remapping support.
@@ -119,22 +101,9 @@ static void hyperv_irq_remapping_free(struct irq_domain *domain,
 	irq_domain_free_irqs_common(domain, virq, nr_irqs);
 }
 
-static int hyperv_irq_remapping_activate(struct irq_domain *domain,
-			  struct irq_data *irq_data, bool reserve)
-{
-	struct irq_cfg *cfg = irqd_cfg(irq_data);
-	struct IO_APIC_route_entry *entry = irq_data->chip_data;
-
-	entry->dest = cfg->dest_apicid;
-	entry->vector = cfg->vector;
-
-	return 0;
-}
-
 static const struct irq_domain_ops hyperv_ir_domain_ops = {
 	.alloc = hyperv_irq_remapping_alloc,
 	.free = hyperv_irq_remapping_free,
-	.activate = hyperv_irq_remapping_activate,
 };
 
 static int __init hyperv_prepare_irq_remapping(void)
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 0cfce1d3b7bb..511dfb4884bc 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1265,7 +1265,6 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 					     struct irq_alloc_info *info,
 					     int index, int sub_handle)
 {
-	struct IR_IO_APIC_route_entry *entry;
 	struct irte *irte = &data->irte_entry;
 	struct msi_msg *msg = &data->msi_entry;
 
@@ -1281,23 +1280,15 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 			irte->avail, irte->vector, irte->dest_id,
 			irte->sid, irte->sq, irte->svt);
 
-		entry = (struct IR_IO_APIC_route_entry *)info->ioapic.entry;
-		info->ioapic.entry = NULL;
-		memset(entry, 0, sizeof(*entry));
-		entry->index2	= (index >> 15) & 0x1;
-		entry->zero	= 0;
-		entry->format	= 1;
-		entry->index	= (index & 0x7fff);
 		/*
 		 * IO-APIC RTE will be configured with virtual vector.
 		 * irq handler will do the explicit EOI to the io-apic.
 		 */
-		entry->vector	= info->ioapic.pin;
-		entry->mask	= 0;			/* enable IRQ */
-		entry->trigger	= info->ioapic.trigger;
-		entry->polarity	= info->ioapic.polarity;
-		if (info->ioapic.trigger)
-			entry->mask = 1; /* Mask level triggered irqs. */
+		msg->data = info->ioapic.pin;
+		msg->address_hi = MSI_ADDR_BASE_HI;
+		msg->address_lo = MSI_ADDR_BASE_LO | MSI_ADDR_IR_EXT_INT |
+				  MSI_ADDR_IR_INDEX1(index) |
+				  MSI_ADDR_IR_INDEX2(index);
 		break;
 
 	case X86_IRQ_ALLOC_TYPE_HPET:
-- 
2.17.1







[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 3/5] x86/ioapic: Handle Extended Destination ID field in RTE
  2020-10-08  9:12     ` Peter Zijlstra
@ 2020-10-08 17:05       ` David Woodhouse
  0 siblings, 0 replies; 58+ messages in thread
From: David Woodhouse @ 2020-10-08 17:05 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: x86, kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel


[-- Attachment #1: Type: text/plain, Size: 1326 bytes --]

On Thu, 2020-10-08 at 11:12 +0200, Peter Zijlstra wrote:
> On Wed, Oct 07, 2020 at 01:20:44PM +0100, David Woodhouse wrote:
> > @@ -1861,7 +1863,8 @@ static void ioapic_configure_entry(struct irq_data *irqd)
> >  	 * ioapic chip to verify that.
> >  	 */
> >  	if (irqd->chip == &ioapic_chip) {
> > -		mpd->entry.dest = cfg->dest_apicid;
> > +		mpd->entry.dest = cfg->dest_apicid & 0xff;
> > +		mpd->entry.ext_dest = cfg->dest_apicid >> 8;
> >  		mpd->entry.vector = cfg->vector;
> >  	}
> >  	for_each_irq_pin(entry, mpd->irq_2_pin)
> 
> All the other sites did memset(0) before the assignment, and this the
> extra unconditional write of 0 to ext_dest is harmless.
> 
> This might be true for this site too, but it wasn't immediately obvious.

Yeah. Really, that whole 16-bit field ought to be called
'msi_addr_bits_19_to_4' since there's no interpretation by the IOAPIC
and it's *just* passed on in the MSI cycle. See my later patch which
stops making them up for ourselves in the IOAPIC code and *just*
swizzles them out of the MSI message which is composed for us by
upstream.

In this case, since this piece of code is based on the knowledge that
the upstream controller is accepting MSI in the x86 Compatibility
Format, those seven bits are never going to have been used for anything
else.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-08 16:08         ` David Woodhouse
@ 2020-10-08 21:14           ` Thomas Gleixner
  2020-10-08 21:39             ` David Woodhouse
  0 siblings, 1 reply; 58+ messages in thread
From: Thomas Gleixner @ 2020-10-08 21:14 UTC (permalink / raw)
  To: David Woodhouse, x86; +Cc: kvm, Paolo Bonzini, linux-kernel

On Thu, Oct 08 2020 at 17:08, David Woodhouse wrote:
> On Thu, 2020-10-08 at 13:55 +0100, David Woodhouse wrote:
>
> (We'd want the x86_vector_domain to actually have an MSI compose
> function in the !CONFIG_PCI_MSI case if we did this, of course.)

The compose function and the vector domain wrapper can simply move to vector.c

> From 2fbc79588d4677ee1cc9df661162fcf1a7da57f0 Mon Sep 17 00:00:00 2001
> From: David Woodhouse <dwmw@amazon.co.uk>
> Date: Thu, 8 Oct 2020 15:44:42 +0100
> Subject: [PATCH 6/5] x86/ioapic: Generate RTE directly from parent irqchip's MSI
>  message
>
> The IOAPIC generates an MSI cycle with address/data bits taken from its
> Redirection Table Entry in some combination which used to make sense,
> but now is just a bunch of bits which get passed through in some
> seemingly arbitrary order.
>
> Instead of making IRQ remapping drivers directly frob the IOAPIC RTE,
> let them just do their job and generate an MSI message. The bit
> swizzling to turn that MSI message into the IOAPIC's RTE is the same in
> all cases, since it's a function of the IOAPIC hardware. The IRQ
> remappers have no real need to get involved with that.
>
> The only slight caveat is that the IOAPIC is interpreting some of
> those fields too, and it does want the 'vector' field to be unique
> to make EOI work. The AMD IOMMU happens to put its IRTE index in the
> bits that the IOAPIC thinks are the vector field, and accommodates
> this requirement by reserving the first 32 indices for the IOAPIC.
> The Intel IOMMU doesn't actually use the bits that the IOAPIC thinks
> are the vector field, so it fills in the 'pin' value there instead.
>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  arch/x86/include/asm/hw_irq.h       | 11 +++---
>  arch/x86/include/asm/msidef.h       |  2 ++
>  arch/x86/kernel/apic/io_apic.c      | 55 ++++++++++++++++++-----------
>  drivers/iommu/amd/iommu.c           | 14 --------
>  drivers/iommu/hyperv-iommu.c        | 31 ----------------
>  drivers/iommu/intel/irq_remapping.c | 19 +++-------
>  6 files changed, 46 insertions(+), 86 deletions(-)

Nice :)

> +static void mp_swizzle_msi_dest_bits(struct irq_data *irq_data, void *_entry)
> +{
> +	struct msi_msg msg;
> +	u32 *entry = _entry;
> +
> +	irq_chip_compose_msi_msg(irq_data, &msg);

Duh, for some stupid reason it never occured to me to do it that
way.

Historically the MSI compose function was part of the MSI PCI chip and I
just changed that recently when I reworked the code to make IMS support
possible.

Historical blinders are pretty sticky :(

> +	/*
> +	 * They're in a bit of a random order for historical reasons, but
> +	 * the IO/APIC is just a device for turning interrupt lines into
> +	 * MSIs, and various bits of the MSI addr/data are just swizzled
> +	 * into/from the bits of Redirection Table Entry.
> +	 */
> +	entry[0] &= 0xfffff000;
> +	entry[0] |= (msg.data & (MSI_DATA_DELIVERY_MODE_MASK |
> +				 MSI_DATA_VECTOR_MASK));
> +	entry[0] |= (msg.address_lo & MSI_ADDR_DEST_MODE_MASK) << 9;
> +
> +	entry[1] &= 0xffff;
> +	entry[1] |= (msg.address_lo & MSI_ADDR_DEST_ID_MASK) << 12;
> +}

....

>  	switch (info->type) {
>  	case X86_IRQ_ALLOC_TYPE_IOAPIC:
> -		/* Setup IOAPIC entry */
> -		entry = info->ioapic.entry;
> -		info->ioapic.entry = NULL;
> -		memset(entry, 0, sizeof(*entry));
> -		entry->vector        = index;
> -		entry->mask          = 0;
> -		entry->trigger       = info->ioapic.trigger;
> -		entry->polarity      = info->ioapic.polarity;
> -		/* Mask level triggered irqs. */
> -		if (info->ioapic.trigger)
> -			entry->mask = 1;
> -		break;
> -

Thanks for cleaning this up!

       tglx


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-08 21:14           ` Thomas Gleixner
@ 2020-10-08 21:39             ` David Woodhouse
  2020-10-08 23:27               ` Thomas Gleixner
  0 siblings, 1 reply; 58+ messages in thread
From: David Woodhouse @ 2020-10-08 21:39 UTC (permalink / raw)
  To: Thomas Gleixner, x86; +Cc: kvm, Paolo Bonzini, linux-kernel


[-- Attachment #1: Type: text/plain, Size: 682 bytes --]

On Thu, 2020-10-08 at 23:14 +0200, Thomas Gleixner wrote:
> On Thu, Oct 08 2020 at 17:08, David Woodhouse wrote:
> > On Thu, 2020-10-08 at 13:55 +0100, David Woodhouse wrote:
> > 
> > (We'd want the x86_vector_domain to actually have an MSI compose
> > function in the !CONFIG_PCI_MSI case if we did this, of course.)
> 
> The compose function and the vector domain wrapper can simply move to
> vector.c

I ended up putting __irq_msi_compose_msg() into apic.c and that way I
can make virt_ext_dest_id static in that file.

And then I can move all the HPET-MSI support into hpet.c too.

https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/ext_dest_id

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-08 21:39             ` David Woodhouse
@ 2020-10-08 23:27               ` Thomas Gleixner
  2020-10-09  6:07                 ` David Woodhouse
  2020-10-10 10:06                 ` David Woodhouse
  0 siblings, 2 replies; 58+ messages in thread
From: Thomas Gleixner @ 2020-10-08 23:27 UTC (permalink / raw)
  To: David Woodhouse, x86; +Cc: kvm, Paolo Bonzini, linux-kernel

On Thu, Oct 08 2020 at 22:39, David Woodhouse wrote:
> On Thu, 2020-10-08 at 23:14 +0200, Thomas Gleixner wrote:
>> > 
>> > (We'd want the x86_vector_domain to actually have an MSI compose
>> > function in the !CONFIG_PCI_MSI case if we did this, of course.)
>> 
>> The compose function and the vector domain wrapper can simply move to
>> vector.c
>
> I ended up putting __irq_msi_compose_msg() into apic.c and that way I
> can make virt_ext_dest_id static in that file.
>
> And then I can move all the HPET-MSI support into hpet.c too.

Works for me.

> https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/ext_dest_id

For the next submission, can you please

 - pick up the -ENODEV changes for HPET/IOAPIC which I posted earlier

 - shuffle all that compose/IOAPIC cleanup around

before adding that extended dest id stuff.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-08 23:27               ` Thomas Gleixner
@ 2020-10-09  6:07                 ` David Woodhouse
  2020-10-10 10:06                 ` David Woodhouse
  1 sibling, 0 replies; 58+ messages in thread
From: David Woodhouse @ 2020-10-09  6:07 UTC (permalink / raw)
  To: Thomas Gleixner, x86; +Cc: kvm, Paolo Bonzini, linux-kernel



On 9 October 2020 00:27:06 BST, Thomas Gleixner <tglx@linutronix.de> wrote:
>On Thu, Oct 08 2020 at 22:39, David Woodhouse wrote:
>> On Thu, 2020-10-08 at 23:14 +0200, Thomas Gleixner wrote:
>>> > 
>>> > (We'd want the x86_vector_domain to actually have an MSI compose
>>> > function in the !CONFIG_PCI_MSI case if we did this, of course.)
>>> 
>>> The compose function and the vector domain wrapper can simply move
>to
>>> vector.c
>>
>> I ended up putting __irq_msi_compose_msg() into apic.c and that way I
>> can make virt_ext_dest_id static in that file.
>>
>> And then I can move all the HPET-MSI support into hpet.c too.
>
>Works for me.
>
>>
>https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/ext_dest_id
>
>For the next submission, can you please
>
> - pick up the -ENODEV changes for HPET/IOAPIC which I posted earlier

Ack.

> - shuffle all that compose/IOAPIC cleanup around

I'd prefer the MSI swizzling bit to stay last in the series. I want to stare hard at the hyperv-iommu part a bit more, and ideally even have it tested. I'd prefer the real functionality that I care about, not to depend on that cleanup.

If it actually let me remove all mention of ext_dest_id from the IOAPIC code and use *only* MSI swizzling, I'd be keener to reorder. But as noted, there are a couple of manual RTE constructions in there still anyway.

I can move __irq_compose_msi_msg() earlier in the series though, and then virt_ext_dest_id can be static in apic.c from its inception.

OK?

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 0/8] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported
  2020-10-07 12:20 [PATCH 0/5] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
  2020-10-07 12:20 ` [PATCH 1/5] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
@ 2020-10-09 10:46 ` David Woodhouse
  2020-10-09 10:46   ` [PATCH v2 1/8] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
                     ` (7 more replies)
  1 sibling, 8 replies; 58+ messages in thread
From: David Woodhouse @ 2020-10-09 10:46 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel, linux-hyperv


Fix the conditions for enabling x2apic on guests without interrupt 
remapping, and support 15-bit Extended Destination ID to allow 32768 
CPUs without IR on hypervisors that support it.

The last patch in the series now makes io_apic.c generate its RTE from 
the MSI message created by the parent irqchip, and removes all the nasty 
hackery where IRQ remapping drivers would frob I/OAPIC RTEs for 
themselves directly. It's last because I'd quite like to see it tested 
especially with Hyper-V and it doesn't actually eliminate the need for 
io_apic.c to know about the 15-bit extension anyway.

v2:
 • Minor cleanups.
 • Move __irq_msi_compose_msg() to apic.c, make virt_ext_dest_id static.
 • Generate I/OAPIC RTE directly from parent irqchip's MSI messages.
 • Clean up HPET MSI support into hpet.c now that we can.

David Woodhouse (8):
      x86/apic: Fix x2apic enablement without interrupt remapping
      x86/msi: Only use high bits of MSI address for DMAR unit
      x86/apic: Always provide irq_compose_msi_msg() method for vector domain
      x86/ioapic: Handle Extended Destination ID field in RTE
      x86/apic: Support 15 bits of APIC ID in MSI where available
      x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
      x86/hpet: Move MSI support into hpet.c
      x86/ioapic: Generate RTE directly from parent irqchip's MSI message

 Documentation/virt/kvm/cpuid.rst     |   4 +
 arch/x86/include/asm/apic.h          |   9 +--
 arch/x86/include/asm/hpet.h          |  11 ---
 arch/x86/include/asm/hw_irq.h        |  11 ++-
 arch/x86/include/asm/io_apic.h       |   3 +-
 arch/x86/include/asm/msidef.h        |   2 +
 arch/x86/include/asm/x86_init.h      |   2 +
 arch/x86/include/uapi/asm/kvm_para.h |   1 +
 arch/x86/kernel/apic/apic.c          |  68 ++++++++++++++--
 arch/x86/kernel/apic/io_apic.c       |  66 +++++++++------
 arch/x86/kernel/apic/msi.c           | 152 +++--------------------------------
 arch/x86/kernel/apic/vector.c        |   6 ++
 arch/x86/kernel/apic/x2apic_phys.c   |   9 +++
 arch/x86/kernel/hpet.c               | 116 ++++++++++++++++++++++++--
 arch/x86/kernel/kvm.c                |   6 ++
 arch/x86/kernel/x86_init.c           |   1 +
 drivers/iommu/amd/iommu.c            |  14 ----
 drivers/iommu/hyperv-iommu.c         |  31 -------
 drivers/iommu/intel/irq_remapping.c  |  19 ++---
 19 files changed, 276 insertions(+), 255 deletions(-)


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 1/8] x86/apic: Fix x2apic enablement without interrupt remapping
  2020-10-09 10:46 ` [PATCH v2 0/8] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
@ 2020-10-09 10:46   ` David Woodhouse
  2020-10-09 10:46   ` [PATCH v2 2/8] x86/msi: Only use high bits of MSI address for DMAR unit David Woodhouse
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 58+ messages in thread
From: David Woodhouse @ 2020-10-09 10:46 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel, linux-hyperv

From: David Woodhouse <dwmw@amazon.co.uk>

Currently, Linux as a hypervisor guest will enable x2apic only if there
are no CPUs present at boot time with an APIC ID above 255.

Hotplugging a CPU later with a higher APIC ID would result in a CPU
which cannot be targeted by external interrupts.

Add a filter in x2apic_apic_id_valid() which can be used to prevent
such CPUs from coming online, and allow x2apic to be enabled even if
they are present at boot time.

Fixes: ce69a784504 ("x86/apic: Enable x2APIC without interrupt remapping under KVM")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/apic.h        |  1 +
 arch/x86/kernel/apic/apic.c        | 14 ++++++++------
 arch/x86/kernel/apic/x2apic_phys.c |  9 +++++++++
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 1c129abb7f09..b0fd204e0023 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -259,6 +259,7 @@ static inline u64 native_x2apic_icr_read(void)
 
 extern int x2apic_mode;
 extern int x2apic_phys;
+extern void __init x2apic_set_max_apicid(u32 apicid);
 extern void __init check_x2apic(void);
 extern void x2apic_setup(void);
 static inline int x2apic_enabled(void)
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index b3eef1d5c903..113f6ca7b828 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1841,20 +1841,22 @@ static __init void try_to_enable_x2apic(int remap_mode)
 		return;
 
 	if (remap_mode != IRQ_REMAP_X2APIC_MODE) {
-		/* IR is required if there is APIC ID > 255 even when running
-		 * under KVM
+		/*
+		 * Using X2APIC without IR is not architecturally supported
+		 * on bare metal but may be supported in guests.
 		 */
-		if (max_physical_apicid > 255 ||
-		    !x86_init.hyper.x2apic_available()) {
+		if (!x86_init.hyper.x2apic_available()) {
 			pr_info("x2apic: IRQ remapping doesn't support X2APIC mode\n");
 			x2apic_disable();
 			return;
 		}
 
 		/*
-		 * without IR all CPUs can be addressed by IOAPIC/MSI
-		 * only in physical mode
+		 * Without IR, all CPUs can be addressed by IOAPIC/MSI only
+		 * in physical mode, and CPUs with an APIC ID that cannnot
+		 * be addressed must not be brought online.
 		 */
+		x2apic_set_max_apicid(255);
 		x2apic_phys = 1;
 	}
 	x2apic_enable();
diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c
index bc9693841353..e14eae6d6ea7 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -8,6 +8,12 @@
 int x2apic_phys;
 
 static struct apic apic_x2apic_phys;
+static u32 x2apic_max_apicid __ro_after_init;
+
+void __init x2apic_set_max_apicid(u32 apicid)
+{
+	x2apic_max_apicid = apicid;
+}
 
 static int __init set_x2apic_phys_mode(char *arg)
 {
@@ -98,6 +104,9 @@ static int x2apic_phys_probe(void)
 /* Common x2apic functions, also used by x2apic_cluster */
 int x2apic_apic_id_valid(u32 apicid)
 {
+	if (x2apic_max_apicid && apicid > x2apic_max_apicid)
+		return 0;
+
 	return 1;
 }
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 2/8] x86/msi: Only use high bits of MSI address for DMAR unit
  2020-10-09 10:46 ` [PATCH v2 0/8] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
  2020-10-09 10:46   ` [PATCH v2 1/8] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
@ 2020-10-09 10:46   ` David Woodhouse
  2020-10-09 10:46   ` [PATCH v2 3/8] x86/apic: Always provide irq_compose_msi_msg() method for vector domain David Woodhouse
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 58+ messages in thread
From: David Woodhouse @ 2020-10-09 10:46 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel, linux-hyperv

From: David Woodhouse <dwmw@amazon.co.uk>

The Intel IOMMU has an MSI-like configuration for its interrupt, but
it isn't really MSI. So it gets to abuse the high 32 bits of the address,
and puts the high 24 bits of the extended APIC ID there.

This isn't something that can be used in the general case for real MSIs,
since external devices using the high bits of the address would be
performing writes to actual memory space above 4GiB, not targeted at the
APIC.

Factor the hack out and allow it only to be used when appropriate,
adding a WARN_ON_ONCE() if other MSIs are targeted at an unreachable
APIC ID. That should never happen since the compatibility MSI messages
are not used when Interrupt Remapping is enabled.

The x2apic_enabled() check isn't needed because Linux won't bring up
CPUs with higher APIC IDs unless IR and x2apic are enabled anyway.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kernel/apic/msi.c | 33 +++++++++++++++++++++++++++------
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 6313f0a05db7..516df47bde73 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -23,13 +23,11 @@
 
 struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
 
-static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg)
+static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
+				  bool dmar)
 {
 	msg->address_hi = MSI_ADDR_BASE_HI;
 
-	if (x2apic_enabled())
-		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
-
 	msg->address_lo =
 		MSI_ADDR_BASE_LO |
 		((apic->irq_dest_mode == 0) ?
@@ -43,18 +41,29 @@ static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg)
 		MSI_DATA_LEVEL_ASSERT |
 		MSI_DATA_DELIVERY_FIXED |
 		MSI_DATA_VECTOR(cfg->vector);
+
+	/*
+	 * Only the IOMMU itself can use the trick of putting destination
+	 * APIC ID into the high bits of the address. Anything else would
+	 * just be writing to memory if it tried that, and needs IR to
+	 * address higher APIC IDs.
+	 */
+	if (dmar)
+		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
+	else
+		WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid));
 }
 
 void x86_vector_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
 {
-	__irq_msi_compose_msg(irqd_cfg(data), msg);
+	__irq_msi_compose_msg(irqd_cfg(data), msg, false);
 }
 
 static void irq_msi_update_msg(struct irq_data *irqd, struct irq_cfg *cfg)
 {
 	struct msi_msg msg[2] = { [1] = { }, };
 
-	__irq_msi_compose_msg(cfg, msg);
+	__irq_msi_compose_msg(cfg, msg, false);
 	irq_data_get_irq_chip(irqd)->irq_write_msi_msg(irqd, msg);
 }
 
@@ -276,6 +285,17 @@ struct irq_domain *arch_create_remap_msi_irq_domain(struct irq_domain *parent,
 #endif
 
 #ifdef CONFIG_DMAR_TABLE
+/*
+ * The Intel IOMMU (ab)uses the high bits of the MSI address to contain the
+ * high bits of the destination APIC ID. This can't be done in the general
+ * case for MSIs as it would be targeting real memory above 4GiB not the
+ * APIC.
+ */
+static void dmar_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
+{
+	__irq_msi_compose_msg(irqd_cfg(data), msg, true);
+}
+
 static void dmar_msi_write_msg(struct irq_data *data, struct msi_msg *msg)
 {
 	dmar_msi_write(data->irq, msg);
@@ -288,6 +308,7 @@ static struct irq_chip dmar_msi_controller = {
 	.irq_ack		= irq_chip_ack_parent,
 	.irq_set_affinity	= msi_domain_set_affinity,
 	.irq_retrigger		= irq_chip_retrigger_hierarchy,
+	.irq_compose_msi_msg	= dmar_msi_compose_msg,
 	.irq_write_msi_msg	= dmar_msi_write_msg,
 	.flags			= IRQCHIP_SKIP_SET_WAKE,
 };
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 3/8] x86/apic: Always provide irq_compose_msi_msg() method for vector domain
  2020-10-09 10:46 ` [PATCH v2 0/8] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
  2020-10-09 10:46   ` [PATCH v2 1/8] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
  2020-10-09 10:46   ` [PATCH v2 2/8] x86/msi: Only use high bits of MSI address for DMAR unit David Woodhouse
@ 2020-10-09 10:46   ` David Woodhouse
  2020-10-09 10:46   ` [PATCH v2 4/8] x86/ioapic: Handle Extended Destination ID field in RTE David Woodhouse
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 58+ messages in thread
From: David Woodhouse @ 2020-10-09 10:46 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel, linux-hyperv

From: David Woodhouse <dwmw@amazon.co.uk>

This shouldn't be dependent on PCI_MSI. HPET and I/OAPIC can deliver
interrupts through MSI without having any PCI in the system at all.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/apic.h   |  8 +++-----
 arch/x86/kernel/apic/apic.c   | 32 +++++++++++++++++++++++++++++++
 arch/x86/kernel/apic/msi.c    | 36 -----------------------------------
 arch/x86/kernel/apic/vector.c |  6 ++++++
 4 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index b0fd204e0023..f5b88fb723bf 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -521,12 +521,10 @@ static inline void apic_smt_update(void) { }
 #endif
 
 struct msi_msg;
+struct irq_cfg;
 
-#ifdef CONFIG_PCI_MSI
-void x86_vector_msi_compose_msg(struct irq_data *data, struct msi_msg *msg);
-#else
-# define x86_vector_msi_compose_msg NULL
-#endif
+extern void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
+				  bool dmar);
 
 extern void ioapic_zap_locks(void);
 
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 113f6ca7b828..fba3ba383ad2 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -50,6 +50,7 @@
 #include <asm/io_apic.h>
 #include <asm/desc.h>
 #include <asm/hpet.h>
+#include <asm/msidef.h>
 #include <asm/mtrr.h>
 #include <asm/time.h>
 #include <asm/smp.h>
@@ -2480,6 +2481,37 @@ int hard_smp_processor_id(void)
 	return read_apic_id();
 }
 
+void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
+			   bool dmar)
+{
+	msg->address_hi = MSI_ADDR_BASE_HI;
+
+	msg->address_lo =
+		MSI_ADDR_BASE_LO |
+		((apic->irq_dest_mode == 0) ?
+			MSI_ADDR_DEST_MODE_PHYSICAL :
+			MSI_ADDR_DEST_MODE_LOGICAL) |
+		MSI_ADDR_REDIRECTION_CPU |
+		MSI_ADDR_DEST_ID(cfg->dest_apicid);
+
+	msg->data =
+		MSI_DATA_TRIGGER_EDGE |
+		MSI_DATA_LEVEL_ASSERT |
+		MSI_DATA_DELIVERY_FIXED |
+		MSI_DATA_VECTOR(cfg->vector);
+
+	/*
+	 * Only the IOMMU itself can use the trick of putting destination
+	 * APIC ID into the high bits of the address. Anything else would
+	 * just be writing to memory if it tried that, and needs IR to
+	 * address higher APIC IDs.
+	 */
+	if (dmar)
+		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
+	else
+		WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid));
+}
+
 /*
  * Override the generic EOI implementation with an optimized version.
  * Only called during early boot when only one CPU is active and with
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 516df47bde73..de585cfa4d6c 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -23,42 +23,6 @@
 
 struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
 
-static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
-				  bool dmar)
-{
-	msg->address_hi = MSI_ADDR_BASE_HI;
-
-	msg->address_lo =
-		MSI_ADDR_BASE_LO |
-		((apic->irq_dest_mode == 0) ?
-			MSI_ADDR_DEST_MODE_PHYSICAL :
-			MSI_ADDR_DEST_MODE_LOGICAL) |
-		MSI_ADDR_REDIRECTION_CPU |
-		MSI_ADDR_DEST_ID(cfg->dest_apicid);
-
-	msg->data =
-		MSI_DATA_TRIGGER_EDGE |
-		MSI_DATA_LEVEL_ASSERT |
-		MSI_DATA_DELIVERY_FIXED |
-		MSI_DATA_VECTOR(cfg->vector);
-
-	/*
-	 * Only the IOMMU itself can use the trick of putting destination
-	 * APIC ID into the high bits of the address. Anything else would
-	 * just be writing to memory if it tried that, and needs IR to
-	 * address higher APIC IDs.
-	 */
-	if (dmar)
-		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
-	else
-		WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid));
-}
-
-void x86_vector_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
-{
-	__irq_msi_compose_msg(irqd_cfg(data), msg, false);
-}
-
 static void irq_msi_update_msg(struct irq_data *irqd, struct irq_cfg *cfg)
 {
 	struct msi_msg msg[2] = { [1] = { }, };
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 1eac53632786..bb2e2a2488a5 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -818,6 +818,12 @@ void apic_ack_edge(struct irq_data *irqd)
 	apic_ack_irq(irqd);
 }
 
+static void x86_vector_msi_compose_msg(struct irq_data *data,
+				       struct msi_msg *msg)
+{
+       __irq_msi_compose_msg(irqd_cfg(data), msg, false);
+}
+
 static struct irq_chip lapic_controller = {
 	.name			= "APIC",
 	.irq_ack		= apic_ack_edge,
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 4/8] x86/ioapic: Handle Extended Destination ID field in RTE
  2020-10-09 10:46 ` [PATCH v2 0/8] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
                     ` (2 preceding siblings ...)
  2020-10-09 10:46   ` [PATCH v2 3/8] x86/apic: Always provide irq_compose_msi_msg() method for vector domain David Woodhouse
@ 2020-10-09 10:46   ` David Woodhouse
  2020-10-09 10:46   ` [PATCH v2 5/8] x86/apic: Support 15 bits of APIC ID in MSI where available David Woodhouse
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 58+ messages in thread
From: David Woodhouse @ 2020-10-09 10:46 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel, linux-hyperv

From: David Woodhouse <dwmw@amazon.co.uk>

Bits 63-48 of the I/OAPIC Redirection Table Entry map directly to
bits 19-4 of the address used in the resulting MSI cycle.

Historically, the x86 MSI format only used the top 8 of those 16 bits as
the destination APIC ID, and the "Extended Destination ID" in the lower
8 bits was unused.

With interrupt remapping, the lowest bit of the Extended Destination ID
(bit 48 of RTE, bit 4 of MSI address) is now used to indicate a
remappable format MSI.

A hypervisor can use the other 7 bits of the Extended Destination ID to
permit guests to address up to 15 bits of APIC IDs, thus allowing 32768
vCPUs before having to expose a vIOMMU and interrupt remapping to the
guest.

This enlightenment could theoretically be transparent to the I/OAPIC
code if it were always generating its RTE from an MSI message created by
the parent irqchip. That cleanup will happen separately but doesn't cover
all cases — for the ExtINT hackery and restoring boot mode, RTEs are
still generated locally. So we have to teach the I/OAPIC about the
ext_dest bits anyway.

No behavioural change in this patch, since nothing yet permits APIC IDs
above 255 to be used with the non-IR I/OAPIC domain.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/io_apic.h |  3 ++-
 arch/x86/kernel/apic/io_apic.c | 19 +++++++++++++------
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index a1a26f6d3aa4..5bb3cf4ff2fd 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -78,7 +78,8 @@ struct IO_APIC_route_entry {
 		mask		:  1,	/* 0: enabled, 1: disabled */
 		__reserved_2	: 15;
 
-	__u32	__reserved_3	: 24,
+	__u32	__reserved_3	: 17,
+		virt_ext_dest	:  7,
 		dest		:  8;
 } __attribute__ ((packed));
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index a33380059db6..54f6a029b1d1 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1239,10 +1239,10 @@ static void io_apic_print_entries(unsigned int apic, unsigned int nr_entries)
 			       buf, (ir_entry->index2 << 15) | ir_entry->index,
 			       ir_entry->zero);
 		else
-			printk(KERN_DEBUG "%s, %s, D(%02X), M(%1d)\n",
+			printk(KERN_DEBUG "%s, %s, D(%02X%02X), M(%1d)\n",
 			       buf,
 			       entry.dest_mode == IOAPIC_DEST_MODE_LOGICAL ?
-			       "logical " : "physical",
+			       "logical " : "physical", entry.virt_ext_dest,
 			       entry.dest, entry.delivery_mode);
 	}
 }
@@ -1410,6 +1410,7 @@ void native_restore_boot_irq_mode(void)
 	 */
 	if (ioapic_i8259.pin != -1) {
 		struct IO_APIC_route_entry entry;
+		u32 apic_id = read_apic_id();
 
 		memset(&entry, 0, sizeof(entry));
 		entry.mask		= IOAPIC_UNMASKED;
@@ -1417,7 +1418,8 @@ void native_restore_boot_irq_mode(void)
 		entry.polarity		= IOAPIC_POL_HIGH;
 		entry.dest_mode		= IOAPIC_DEST_MODE_PHYSICAL;
 		entry.delivery_mode	= dest_ExtINT;
-		entry.dest		= read_apic_id();
+		entry.dest		= apic_id & 0xff;
+		entry.virt_ext_dest	= apic_id >> 8;
 
 		/*
 		 * Add it to the IO-APIC irq-routing table:
@@ -1861,7 +1863,8 @@ static void ioapic_configure_entry(struct irq_data *irqd)
 	 * ioapic chip to verify that.
 	 */
 	if (irqd->chip == &ioapic_chip) {
-		mpd->entry.dest = cfg->dest_apicid;
+		mpd->entry.dest = cfg->dest_apicid & 0xff;
+		mpd->entry.virt_ext_dest = cfg->dest_apicid >> 8;
 		mpd->entry.vector = cfg->vector;
 	}
 	for_each_irq_pin(entry, mpd->irq_2_pin)
@@ -2027,6 +2030,7 @@ static inline void __init unlock_ExtINT_logic(void)
 	int apic, pin, i;
 	struct IO_APIC_route_entry entry0, entry1;
 	unsigned char save_control, save_freq_select;
+	u32 apic_id;
 
 	pin  = find_isa_irq_pin(8, mp_INT);
 	if (pin == -1) {
@@ -2042,11 +2046,13 @@ static inline void __init unlock_ExtINT_logic(void)
 	entry0 = ioapic_read_entry(apic, pin);
 	clear_IO_APIC_pin(apic, pin);
 
+	apic_id = hard_smp_processor_id();
 	memset(&entry1, 0, sizeof(entry1));
 
 	entry1.dest_mode = IOAPIC_DEST_MODE_PHYSICAL;
 	entry1.mask = IOAPIC_UNMASKED;
-	entry1.dest = hard_smp_processor_id();
+	entry1.dest = apic_id & 0xff;
+	entry1.virt_ext_dest = apic_id >> 8;
 	entry1.delivery_mode = dest_ExtINT;
 	entry1.polarity = entry0.polarity;
 	entry1.trigger = IOAPIC_EDGE;
@@ -2949,7 +2955,8 @@ static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
 	memset(entry, 0, sizeof(*entry));
 	entry->delivery_mode = apic->irq_delivery_mode;
 	entry->dest_mode     = apic->irq_dest_mode;
-	entry->dest	     = cfg->dest_apicid;
+	entry->dest	     = cfg->dest_apicid & 0xff;
+	entry->virt_ext_dest = cfg->dest_apicid >> 8;
 	entry->vector	     = cfg->vector;
 	entry->trigger	     = data->trigger;
 	entry->polarity	     = data->polarity;
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 5/8] x86/apic: Support 15 bits of APIC ID in MSI where available
  2020-10-09 10:46 ` [PATCH v2 0/8] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
                     ` (3 preceding siblings ...)
  2020-10-09 10:46   ` [PATCH v2 4/8] x86/ioapic: Handle Extended Destination ID field in RTE David Woodhouse
@ 2020-10-09 10:46   ` David Woodhouse
  2020-10-09 10:46   ` [PATCH v2 6/8] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID David Woodhouse
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 58+ messages in thread
From: David Woodhouse @ 2020-10-09 10:46 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel, linux-hyperv

From: David Woodhouse <dwmw@amazon.co.uk>

Some hypervisors can allow the guest to use the Extended Destination ID
field in the MSI address to address up to 32768 CPUs.

This applies to all downstream devices which generate MSI cycles,
including HPET, I/OAPIC and PCI MSI.

HPET and PCI MSI use the same __irq_msi_compose_msg() function, while
I/OAPIC generates its own and had support for the extended bits added in
a previous commit.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/x86_init.h |  2 ++
 arch/x86/kernel/apic/apic.c     | 26 ++++++++++++++++++++++++--
 arch/x86/kernel/x86_init.c      |  1 +
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 397196fae24d..96a719fb2e53 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -114,6 +114,7 @@ struct x86_init_pci {
  * @init_platform:		platform setup
  * @guest_late_init:		guest late init
  * @x2apic_available:		X2APIC detection
+ * @msi_ext_dest_id:		MSI supports 15-bit APIC IDs
  * @init_mem_mapping:		setup early mappings during init_mem_mapping()
  * @init_after_bootmem:		guest init after boot allocator is finished
  */
@@ -121,6 +122,7 @@ struct x86_hyper_init {
 	void (*init_platform)(void);
 	void (*guest_late_init)(void);
 	bool (*x2apic_available)(void);
+	bool (*msi_ext_dest_id)(void);
 	void (*init_mem_mapping)(void);
 	void (*init_after_bootmem)(void);
 };
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index fba3ba383ad2..62e769c267c3 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -94,6 +94,11 @@ static unsigned int disabled_cpu_apicid __ro_after_init = BAD_APICID;
  */
 static int apic_extnmi __ro_after_init = APIC_EXTNMI_BSP;
 
+/*
+ * Hypervisor supports 15 bits of APIC ID in MSI Extended Destination ID
+ */
+static bool virt_ext_dest_id __ro_after_init;
+
 /*
  * Map cpu index to physical APIC ID
  */
@@ -1842,6 +1847,8 @@ static __init void try_to_enable_x2apic(int remap_mode)
 		return;
 
 	if (remap_mode != IRQ_REMAP_X2APIC_MODE) {
+		u32 apic_limit = 255;
+
 		/*
 		 * Using X2APIC without IR is not architecturally supported
 		 * on bare metal but may be supported in guests.
@@ -1852,12 +1859,22 @@ static __init void try_to_enable_x2apic(int remap_mode)
 			return;
 		}
 
+		/*
+		 * If the hypervisor supports extended destination ID in
+		 * MSI, that increases the maximum APIC ID that can be
+		 * used for non-remapped IRQ domains.
+		 */
+		if (x86_init.hyper.msi_ext_dest_id()) {
+			virt_ext_dest_id = 1;
+			apic_limit = 32767;
+		}
+
 		/*
 		 * Without IR, all CPUs can be addressed by IOAPIC/MSI only
 		 * in physical mode, and CPUs with an APIC ID that cannnot
 		 * be addressed must not be brought online.
 		 */
-		x2apic_set_max_apicid(255);
+		x2apic_set_max_apicid(apic_limit);
 		x2apic_phys = 1;
 	}
 	x2apic_enable();
@@ -2504,10 +2521,15 @@ void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
 	 * Only the IOMMU itself can use the trick of putting destination
 	 * APIC ID into the high bits of the address. Anything else would
 	 * just be writing to memory if it tried that, and needs IR to
-	 * address higher APIC IDs.
+	 * address APICs which can't be addressed in the normal 32-bit
+	 * address range at 0xFFExxxxx. That is typically just 8 bits, but
+	 * some hypervisors allow the extended destination ID field in bits
+	 * 11-5 to be used, giving support for 15 bits of APIC IDs in total.
 	 */
 	if (dmar)
 		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
+	else if (virt_ext_dest_id && cfg->dest_apicid < 0x8000)
+		msg->address_lo |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid) >> 3;
 	else
 		WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid));
 }
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index a3038d8deb6a..8b395821cb8d 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -110,6 +110,7 @@ struct x86_init_ops x86_init __initdata = {
 		.init_platform		= x86_init_noop,
 		.guest_late_init	= x86_init_noop,
 		.x2apic_available	= bool_x86_init_noop,
+		.msi_ext_dest_id	= bool_x86_init_noop,
 		.init_mem_mapping	= x86_init_noop,
 		.init_after_bootmem	= x86_init_noop,
 	},
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 6/8] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-09 10:46 ` [PATCH v2 0/8] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
                     ` (4 preceding siblings ...)
  2020-10-09 10:46   ` [PATCH v2 5/8] x86/apic: Support 15 bits of APIC ID in MSI where available David Woodhouse
@ 2020-10-09 10:46   ` David Woodhouse
  2020-10-09 10:46   ` [PATCH v2 7/8] x86/hpet: Move MSI support into hpet.c David Woodhouse
  2020-10-09 10:46   ` [PATCH v2 8/8] x86/ioapic: Generate RTE directly from parent irqchip's MSI message David Woodhouse
  7 siblings, 0 replies; 58+ messages in thread
From: David Woodhouse @ 2020-10-09 10:46 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel, linux-hyperv

From: David Woodhouse <dwmw@amazon.co.uk>

This allows the host to indicate that MSI emulation supports 15-bit
destination IDs, allowing up to 32768 CPUs without interrupt remapping.

cf. https://patchwork.kernel.org/patch/11816693/ for qemu

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/virt/kvm/cpuid.rst     | 4 ++++
 arch/x86/include/uapi/asm/kvm_para.h | 1 +
 arch/x86/kernel/kvm.c                | 6 ++++++
 3 files changed, 11 insertions(+)

diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index a7dff9186bed..1726b5925d2b 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -92,6 +92,10 @@ KVM_FEATURE_ASYNC_PF_INT          14          guest checks this feature bit
                                               async pf acknowledgment msr
                                               0x4b564d07.
 
+KVM_FEATURE_MSI_EXT_DEST_ID       15          guest checks this feature bit
+                                              before using extended destination
+                                              ID bits in MSI address bits 11-5.
+
 KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24          host will warn if no guest-side
                                               per-cpu warps are expeced in
                                               kvmclock
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 812e9b4c1114..950afebfba88 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -32,6 +32,7 @@
 #define KVM_FEATURE_POLL_CONTROL	12
 #define KVM_FEATURE_PV_SCHED_YIELD	13
 #define KVM_FEATURE_ASYNC_PF_INT	14
+#define KVM_FEATURE_MSI_EXT_DEST_ID	15
 
 #define KVM_HINTS_REALTIME      0
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 1b51b727b140..4986b4399aef 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -743,12 +743,18 @@ static void __init kvm_init_platform(void)
 	x86_platform.apic_post_init = kvm_apic_init;
 }
 
+static bool __init kvm_msi_ext_dest_id(void)
+{
+	return kvm_para_has_feature(KVM_FEATURE_MSI_EXT_DEST_ID);
+}
+
 const __initconst struct hypervisor_x86 x86_hyper_kvm = {
 	.name			= "KVM",
 	.detect			= kvm_detect,
 	.type			= X86_HYPER_KVM,
 	.init.guest_late_init	= kvm_guest_init,
 	.init.x2apic_available	= kvm_para_available,
+	.init.msi_ext_dest_id	= kvm_msi_ext_dest_id,
 	.init.init_platform	= kvm_init_platform,
 };
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 7/8] x86/hpet: Move MSI support into hpet.c
  2020-10-09 10:46 ` [PATCH v2 0/8] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
                     ` (5 preceding siblings ...)
  2020-10-09 10:46   ` [PATCH v2 6/8] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID David Woodhouse
@ 2020-10-09 10:46   ` David Woodhouse
  2020-10-09 10:46   ` [PATCH v2 8/8] x86/ioapic: Generate RTE directly from parent irqchip's MSI message David Woodhouse
  7 siblings, 0 replies; 58+ messages in thread
From: David Woodhouse @ 2020-10-09 10:46 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel, linux-hyperv

From: David Woodhouse <dwmw@amazon.co.uk>

This isn't really dependent on PCI MSI; it's just generic MSI which is
now supported by the generic x86_vector_domain. Move the HPET MSI
support back into hpet.c with the rest of the HPET support.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/hpet.h |  11 ----
 arch/x86/kernel/apic/msi.c  | 111 ----------------------------------
 arch/x86/kernel/hpet.c      | 116 ++++++++++++++++++++++++++++++++++--
 3 files changed, 111 insertions(+), 127 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 6352dee37cda..ab9f3dd87c80 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -74,17 +74,6 @@ extern void hpet_disable(void);
 extern unsigned int hpet_readl(unsigned int a);
 extern void force_hpet_resume(void);
 
-struct irq_data;
-struct hpet_channel;
-struct irq_domain;
-
-extern void hpet_msi_unmask(struct irq_data *data);
-extern void hpet_msi_mask(struct irq_data *data);
-extern void hpet_msi_write(struct hpet_channel *hc, struct msi_msg *msg);
-extern struct irq_domain *hpet_create_irq_domain(int hpet_id);
-extern int hpet_assign_irq(struct irq_domain *domain,
-			   struct hpet_channel *hc, int dev_num);
-
 #ifdef CONFIG_HPET_EMULATE_RTC
 
 #include <linux/interrupt.h>
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index de585cfa4d6c..74190f83c034 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -341,114 +341,3 @@ void dmar_free_hwirq(int irq)
 	irq_domain_free_irqs(irq, 1);
 }
 #endif
-
-/*
- * MSI message composition
- */
-#ifdef CONFIG_HPET_TIMER
-static inline int hpet_dev_id(struct irq_domain *domain)
-{
-	struct msi_domain_info *info = msi_get_domain_info(domain);
-
-	return (int)(long)info->data;
-}
-
-static void hpet_msi_write_msg(struct irq_data *data, struct msi_msg *msg)
-{
-	hpet_msi_write(irq_data_get_irq_handler_data(data), msg);
-}
-
-static struct irq_chip hpet_msi_controller __ro_after_init = {
-	.name = "HPET-MSI",
-	.irq_unmask = hpet_msi_unmask,
-	.irq_mask = hpet_msi_mask,
-	.irq_ack = irq_chip_ack_parent,
-	.irq_set_affinity = msi_domain_set_affinity,
-	.irq_retrigger = irq_chip_retrigger_hierarchy,
-	.irq_write_msi_msg = hpet_msi_write_msg,
-	.flags = IRQCHIP_SKIP_SET_WAKE,
-};
-
-static int hpet_msi_init(struct irq_domain *domain,
-			 struct msi_domain_info *info, unsigned int virq,
-			 irq_hw_number_t hwirq, msi_alloc_info_t *arg)
-{
-	irq_set_status_flags(virq, IRQ_MOVE_PCNTXT);
-	irq_domain_set_info(domain, virq, arg->hwirq, info->chip, NULL,
-			    handle_edge_irq, arg->data, "edge");
-
-	return 0;
-}
-
-static void hpet_msi_free(struct irq_domain *domain,
-			  struct msi_domain_info *info, unsigned int virq)
-{
-	irq_clear_status_flags(virq, IRQ_MOVE_PCNTXT);
-}
-
-static struct msi_domain_ops hpet_msi_domain_ops = {
-	.msi_init	= hpet_msi_init,
-	.msi_free	= hpet_msi_free,
-};
-
-static struct msi_domain_info hpet_msi_domain_info = {
-	.ops		= &hpet_msi_domain_ops,
-	.chip		= &hpet_msi_controller,
-	.flags		= MSI_FLAG_USE_DEF_DOM_OPS,
-};
-
-struct irq_domain *hpet_create_irq_domain(int hpet_id)
-{
-	struct msi_domain_info *domain_info;
-	struct irq_domain *parent, *d;
-	struct irq_alloc_info info;
-	struct fwnode_handle *fn;
-
-	if (x86_vector_domain == NULL)
-		return NULL;
-
-	domain_info = kzalloc(sizeof(*domain_info), GFP_KERNEL);
-	if (!domain_info)
-		return NULL;
-
-	*domain_info = hpet_msi_domain_info;
-	domain_info->data = (void *)(long)hpet_id;
-
-	init_irq_alloc_info(&info, NULL);
-	info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
-	info.devid = hpet_id;
-	parent = irq_remapping_get_irq_domain(&info);
-	if (parent == NULL)
-		parent = x86_vector_domain;
-	else
-		hpet_msi_controller.name = "IR-HPET-MSI";
-
-	fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
-					      hpet_id);
-	if (!fn) {
-		kfree(domain_info);
-		return NULL;
-	}
-
-	d = msi_create_irq_domain(fn, domain_info, parent);
-	if (!d) {
-		irq_domain_free_fwnode(fn);
-		kfree(domain_info);
-	}
-	return d;
-}
-
-int hpet_assign_irq(struct irq_domain *domain, struct hpet_channel *hc,
-		    int dev_num)
-{
-	struct irq_alloc_info info;
-
-	init_irq_alloc_info(&info, NULL);
-	info.type = X86_IRQ_ALLOC_TYPE_HPET;
-	info.data = hc;
-	info.devid = hpet_dev_id(domain);
-	info.hwirq = dev_num;
-
-	return irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, &info);
-}
-#endif
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 7a50f0b62a70..11fd2676fb1d 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -7,6 +7,7 @@
 #include <linux/cpu.h>
 #include <linux/irq.h>
 
+#include <asm/irq_remapping.h>
 #include <asm/hpet.h>
 #include <asm/time.h>
 
@@ -467,9 +468,8 @@ static void __init hpet_legacy_clockevent_register(struct hpet_channel *hc)
 /*
  * HPET MSI Support
  */
-#ifdef CONFIG_PCI_MSI
-
-void hpet_msi_unmask(struct irq_data *data)
+#ifdef CONFIG_GENERIC_MSI_IRQ
+static void hpet_msi_unmask(struct irq_data *data)
 {
 	struct hpet_channel *hc = irq_data_get_irq_handler_data(data);
 	unsigned int cfg;
@@ -479,7 +479,7 @@ void hpet_msi_unmask(struct irq_data *data)
 	hpet_writel(cfg, HPET_Tn_CFG(hc->num));
 }
 
-void hpet_msi_mask(struct irq_data *data)
+static void hpet_msi_mask(struct irq_data *data)
 {
 	struct hpet_channel *hc = irq_data_get_irq_handler_data(data);
 	unsigned int cfg;
@@ -489,12 +489,118 @@ void hpet_msi_mask(struct irq_data *data)
 	hpet_writel(cfg, HPET_Tn_CFG(hc->num));
 }
 
-void hpet_msi_write(struct hpet_channel *hc, struct msi_msg *msg)
+static void hpet_msi_write(struct hpet_channel *hc, struct msi_msg *msg)
 {
 	hpet_writel(msg->data, HPET_Tn_ROUTE(hc->num));
 	hpet_writel(msg->address_lo, HPET_Tn_ROUTE(hc->num) + 4);
 }
 
+static void hpet_msi_write_msg(struct irq_data *data, struct msi_msg *msg)
+{
+	hpet_msi_write(irq_data_get_irq_handler_data(data), msg);
+}
+
+static struct irq_chip hpet_msi_controller __ro_after_init = {
+	.name = "HPET-MSI",
+	.irq_unmask = hpet_msi_unmask,
+	.irq_mask = hpet_msi_mask,
+	.irq_ack = irq_chip_ack_parent,
+	.irq_set_affinity = msi_domain_set_affinity,
+	.irq_retrigger = irq_chip_retrigger_hierarchy,
+	.irq_write_msi_msg = hpet_msi_write_msg,
+	.flags = IRQCHIP_SKIP_SET_WAKE,
+};
+
+static int hpet_msi_init(struct irq_domain *domain,
+			 struct msi_domain_info *info, unsigned int virq,
+			 irq_hw_number_t hwirq, msi_alloc_info_t *arg)
+{
+	irq_set_status_flags(virq, IRQ_MOVE_PCNTXT);
+	irq_domain_set_info(domain, virq, arg->hwirq, info->chip, NULL,
+			    handle_edge_irq, arg->data, "edge");
+
+	return 0;
+}
+
+static void hpet_msi_free(struct irq_domain *domain,
+			  struct msi_domain_info *info, unsigned int virq)
+{
+	irq_clear_status_flags(virq, IRQ_MOVE_PCNTXT);
+}
+
+static struct msi_domain_ops hpet_msi_domain_ops = {
+	.msi_init	= hpet_msi_init,
+	.msi_free	= hpet_msi_free,
+};
+
+static struct msi_domain_info hpet_msi_domain_info = {
+	.ops		= &hpet_msi_domain_ops,
+	.chip		= &hpet_msi_controller,
+	.flags		= MSI_FLAG_USE_DEF_DOM_OPS,
+};
+
+static struct irq_domain *hpet_create_irq_domain(int hpet_id)
+{
+	struct msi_domain_info *domain_info;
+	struct irq_domain *parent, *d;
+	struct irq_alloc_info info;
+	struct fwnode_handle *fn;
+
+	if (x86_vector_domain == NULL)
+		return NULL;
+
+	domain_info = kzalloc(sizeof(*domain_info), GFP_KERNEL);
+	if (!domain_info)
+		return NULL;
+
+	*domain_info = hpet_msi_domain_info;
+	domain_info->data = (void *)(long)hpet_id;
+
+	init_irq_alloc_info(&info, NULL);
+	info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
+	info.devid = hpet_id;
+	parent = irq_remapping_get_irq_domain(&info);
+	if (parent == NULL)
+		parent = x86_vector_domain;
+	else
+		hpet_msi_controller.name = "IR-HPET-MSI";
+
+	fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
+					      hpet_id);
+	if (!fn) {
+		kfree(domain_info);
+		return NULL;
+	}
+
+	d = msi_create_irq_domain(fn, domain_info, parent);
+	if (!d) {
+		irq_domain_free_fwnode(fn);
+		kfree(domain_info);
+	}
+	return d;
+}
+
+static inline int hpet_dev_id(struct irq_domain *domain)
+{
+	struct msi_domain_info *info = msi_get_domain_info(domain);
+
+	return (int)(long)info->data;
+}
+
+static int hpet_assign_irq(struct irq_domain *domain, struct hpet_channel *hc,
+			   int dev_num)
+{
+	struct irq_alloc_info info;
+
+	init_irq_alloc_info(&info, NULL);
+	info.type = X86_IRQ_ALLOC_TYPE_HPET;
+	info.data = hc;
+	info.devid = hpet_dev_id(domain);
+	info.hwirq = dev_num;
+
+	return irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, &info);
+}
+
 static int hpet_clkevt_msi_resume(struct clock_event_device *evt)
 {
 	struct hpet_channel *hc = clockevent_to_channel(evt);
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 8/8] x86/ioapic: Generate RTE directly from parent irqchip's MSI message
  2020-10-09 10:46 ` [PATCH v2 0/8] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
                     ` (6 preceding siblings ...)
  2020-10-09 10:46   ` [PATCH v2 7/8] x86/hpet: Move MSI support into hpet.c David Woodhouse
@ 2020-10-09 10:46   ` David Woodhouse
  2020-10-22 21:43     ` Thomas Gleixner
  7 siblings, 1 reply; 58+ messages in thread
From: David Woodhouse @ 2020-10-09 10:46 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel, linux-hyperv

From: David Woodhouse <dwmw@amazon.co.uk>

The I/OAPIC generates an MSI cycle with address/data bits taken from its
Redirection Table Entry in some combination which used to make sense,
but now is just a bunch of bits which get passed through in some
seemingly arbitrary order.

Instead of making IRQ remapping drivers directly frob the I/OAPIC RTE,
let them just do their job and generate an MSI message. The bit
swizzling to turn that MSI message into the IOAPIC's RTE is the same in
all cases, since it's a function of the I/OAPIC hardware. The IRQ
remappers have no real need to get involved with that.

The only slight caveat is that the I/OAPIC is interpreting some of
those fields too, and it does want the 'vector' field to be unique
to make EOI work. The AMD IOMMU happens to put its IRTE index in the
bits that the I/OAPIC thinks are the vector field, and accommodates
this requirement by reserving the first 32 indices for the I/OAPIC.
The Intel IOMMU doesn't actually use the bits that the I/OAPIC thinks
are the vector field, so it fills in the 'pin' value there instead.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/hw_irq.h       | 11 +++---
 arch/x86/include/asm/msidef.h       |  2 ++
 arch/x86/kernel/apic/io_apic.c      | 55 ++++++++++++++++++-----------
 drivers/iommu/amd/iommu.c           | 14 --------
 drivers/iommu/hyperv-iommu.c        | 31 ----------------
 drivers/iommu/intel/irq_remapping.c | 19 +++-------
 6 files changed, 46 insertions(+), 86 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index a4aeeaace040..aabd8f1b6bb0 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -45,12 +45,11 @@ enum irq_alloc_type {
 };
 
 struct ioapic_alloc_info {
-	int				pin;
-	int				node;
-	u32				trigger : 1;
-	u32				polarity : 1;
-	u32				valid : 1;
-	struct IO_APIC_route_entry	*entry;
+	int		pin;
+	int		node;
+	u32		trigger : 1;
+	u32		polarity : 1;
+	u32		valid : 1;
 };
 
 struct uv_alloc_info {
diff --git a/arch/x86/include/asm/msidef.h b/arch/x86/include/asm/msidef.h
index ee2f8ccc32d0..37c3d2d492c9 100644
--- a/arch/x86/include/asm/msidef.h
+++ b/arch/x86/include/asm/msidef.h
@@ -18,6 +18,7 @@
 #define MSI_DATA_DELIVERY_MODE_SHIFT	8
 #define  MSI_DATA_DELIVERY_FIXED	(0 << MSI_DATA_DELIVERY_MODE_SHIFT)
 #define  MSI_DATA_DELIVERY_LOWPRI	(1 << MSI_DATA_DELIVERY_MODE_SHIFT)
+#define  MSI_DATA_DELIVERY_MODE_MASK	(3 << MSI_DATA_DELIVERY_MODE_SHIFT)
 
 #define MSI_DATA_LEVEL_SHIFT		14
 #define	 MSI_DATA_LEVEL_DEASSERT	(0 << MSI_DATA_LEVEL_SHIFT)
@@ -37,6 +38,7 @@
 #define MSI_ADDR_DEST_MODE_SHIFT	2
 #define  MSI_ADDR_DEST_MODE_PHYSICAL	(0 << MSI_ADDR_DEST_MODE_SHIFT)
 #define	 MSI_ADDR_DEST_MODE_LOGICAL	(1 << MSI_ADDR_DEST_MODE_SHIFT)
+#define  MSI_ADDR_DEST_MODE_MASK	(1 << MSI_DATA_DELIVERY_MODE_SHIFT)
 
 #define MSI_ADDR_REDIRECTION_SHIFT	3
 #define  MSI_ADDR_REDIRECTION_CPU	(0 << MSI_ADDR_REDIRECTION_SHIFT)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 54f6a029b1d1..ca2da19d5c55 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -48,6 +48,7 @@
 #include <linux/jiffies.h>	/* time_after() */
 #include <linux/slab.h>
 #include <linux/memblock.h>
+#include <linux/msi.h>
 
 #include <asm/irqdomain.h>
 #include <asm/io.h>
@@ -63,6 +64,7 @@
 #include <asm/setup.h>
 #include <asm/irq_remapping.h>
 #include <asm/hw_irq.h>
+#include <asm/msidef.h>
 
 #include <asm/apic.h>
 
@@ -1851,22 +1853,36 @@ static void ioapic_ir_ack_level(struct irq_data *irq_data)
 	eoi_ioapic_pin(data->entry.vector, data);
 }
 
+static void mp_swizzle_msi_dest_bits(struct irq_data *irq_data, void *_entry)
+{
+	struct msi_msg msg;
+	u32 *entry = _entry;
+
+	irq_chip_compose_msi_msg(irq_data, &msg);
+
+	/*
+	 * They're in a bit of a random order for historical reasons, but
+	 * the IO/APIC is just a device for turning interrupt lines into
+	 * MSIs, and various bits of the MSI addr/data are just swizzled
+	 * into/from the bits of Redirection Table Entry.
+	 */
+	entry[0] &= 0xfffff000;
+	entry[0] |= (msg.data & (MSI_DATA_DELIVERY_MODE_MASK |
+				 MSI_DATA_VECTOR_MASK));
+	entry[0] |= (msg.address_lo & MSI_ADDR_DEST_MODE_MASK) << 9;
+
+	entry[1] &= 0xffff;
+	entry[1] |= (msg.address_lo & MSI_ADDR_DEST_ID_MASK) << 12;
+}
+
+
 static void ioapic_configure_entry(struct irq_data *irqd)
 {
 	struct mp_chip_data *mpd = irqd->chip_data;
-	struct irq_cfg *cfg = irqd_cfg(irqd);
 	struct irq_pin_list *entry;
 
-	/*
-	 * Only update when the parent is the vector domain, don't touch it
-	 * if the parent is the remapping domain. Check the installed
-	 * ioapic chip to verify that.
-	 */
-	if (irqd->chip == &ioapic_chip) {
-		mpd->entry.dest = cfg->dest_apicid & 0xff;
-		mpd->entry.virt_ext_dest = cfg->dest_apicid >> 8;
-		mpd->entry.vector = cfg->vector;
-	}
+	mp_swizzle_msi_dest_bits(irqd, &mpd->entry);
+
 	for_each_irq_pin(entry, mpd->irq_2_pin)
 		__ioapic_write_entry(entry->apic, entry->pin, mpd->entry);
 }
@@ -2949,15 +2965,14 @@ static void mp_irqdomain_get_attr(u32 gsi, struct mp_chip_data *data,
 	}
 }
 
-static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
-			   struct IO_APIC_route_entry *entry)
+static void mp_setup_entry(struct irq_data *irq_data, struct mp_chip_data *data)
 {
+	struct IO_APIC_route_entry *entry = &data->entry;
+
 	memset(entry, 0, sizeof(*entry));
-	entry->delivery_mode = apic->irq_delivery_mode;
-	entry->dest_mode     = apic->irq_dest_mode;
-	entry->dest	     = cfg->dest_apicid & 0xff;
-	entry->virt_ext_dest = cfg->dest_apicid >> 8;
-	entry->vector	     = cfg->vector;
+
+	mp_swizzle_msi_dest_bits(irq_data, entry);
+
 	entry->trigger	     = data->trigger;
 	entry->polarity	     = data->polarity;
 	/*
@@ -2995,7 +3010,6 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
 	if (!data)
 		return -ENOMEM;
 
-	info->ioapic.entry = &data->entry;
 	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, info);
 	if (ret < 0) {
 		kfree(data);
@@ -3013,8 +3027,7 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
 	add_pin_to_irq_node(data, ioapic_alloc_attr_node(info), ioapic, pin);
 
 	local_irq_save(flags);
-	if (info->ioapic.entry)
-		mp_setup_entry(cfg, data, info->ioapic.entry);
+	mp_setup_entry(irq_data, data);
 	mp_register_handler(virq, data->trigger);
 	if (virq < nr_legacy_irqs())
 		legacy_pic->mask(virq);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index ef64e01f66d7..13d0a8f42d56 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3597,7 +3597,6 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 {
 	struct irq_2_irte *irte_info = &data->irq_2_irte;
 	struct msi_msg *msg = &data->msi_entry;
-	struct IO_APIC_route_entry *entry;
 	struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
 
 	if (!iommu)
@@ -3611,19 +3610,6 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC:
-		/* Setup IOAPIC entry */
-		entry = info->ioapic.entry;
-		info->ioapic.entry = NULL;
-		memset(entry, 0, sizeof(*entry));
-		entry->vector        = index;
-		entry->mask          = 0;
-		entry->trigger       = info->ioapic.trigger;
-		entry->polarity      = info->ioapic.polarity;
-		/* Mask level triggered irqs. */
-		if (info->ioapic.trigger)
-			entry->mask = 1;
-		break;
-
 	case X86_IRQ_ALLOC_TYPE_HPET:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e09e2d734c57..37dd485a5640 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -40,7 +40,6 @@ static int hyperv_ir_set_affinity(struct irq_data *data,
 {
 	struct irq_data *parent = data->parent_data;
 	struct irq_cfg *cfg = irqd_cfg(data);
-	struct IO_APIC_route_entry *entry;
 	int ret;
 
 	/* Return error If new irq affinity is out of ioapic_max_cpumask. */
@@ -51,9 +50,6 @@ static int hyperv_ir_set_affinity(struct irq_data *data,
 	if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
 		return ret;
 
-	entry = data->chip_data;
-	entry->dest = cfg->dest_apicid;
-	entry->vector = cfg->vector;
 	send_cleanup_vector(cfg);
 
 	return 0;
@@ -89,20 +85,6 @@ static int hyperv_irq_remapping_alloc(struct irq_domain *domain,
 
 	irq_data->chip = &hyperv_ir_chip;
 
-	/*
-	 * If there is interrupt remapping function of IOMMU, setting irq
-	 * affinity only needs to change IRTE of IOMMU. But Hyper-V doesn't
-	 * support interrupt remapping function, setting irq affinity of IO-APIC
-	 * interrupts still needs to change IO-APIC registers. But ioapic_
-	 * configure_entry() will ignore value of cfg->vector and cfg->
-	 * dest_apicid when IO-APIC's parent irq domain is not the vector
-	 * domain.(See ioapic_configure_entry()) In order to setting vector
-	 * and dest_apicid to IO-APIC register, IO-APIC entry pointer is saved
-	 * in the chip_data and hyperv_irq_remapping_activate()/hyperv_ir_set_
-	 * affinity() set vector and dest_apicid directly into IO-APIC entry.
-	 */
-	irq_data->chip_data = info->ioapic.entry;
-
 	/*
 	 * Hypver-V IO APIC irq affinity should be in the scope of
 	 * ioapic_max_cpumask because no irq remapping support.
@@ -119,22 +101,9 @@ static void hyperv_irq_remapping_free(struct irq_domain *domain,
 	irq_domain_free_irqs_common(domain, virq, nr_irqs);
 }
 
-static int hyperv_irq_remapping_activate(struct irq_domain *domain,
-			  struct irq_data *irq_data, bool reserve)
-{
-	struct irq_cfg *cfg = irqd_cfg(irq_data);
-	struct IO_APIC_route_entry *entry = irq_data->chip_data;
-
-	entry->dest = cfg->dest_apicid;
-	entry->vector = cfg->vector;
-
-	return 0;
-}
-
 static const struct irq_domain_ops hyperv_ir_domain_ops = {
 	.alloc = hyperv_irq_remapping_alloc,
 	.free = hyperv_irq_remapping_free,
-	.activate = hyperv_irq_remapping_activate,
 };
 
 static int __init hyperv_prepare_irq_remapping(void)
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 0cfce1d3b7bb..511dfb4884bc 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1265,7 +1265,6 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 					     struct irq_alloc_info *info,
 					     int index, int sub_handle)
 {
-	struct IR_IO_APIC_route_entry *entry;
 	struct irte *irte = &data->irte_entry;
 	struct msi_msg *msg = &data->msi_entry;
 
@@ -1281,23 +1280,15 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 			irte->avail, irte->vector, irte->dest_id,
 			irte->sid, irte->sq, irte->svt);
 
-		entry = (struct IR_IO_APIC_route_entry *)info->ioapic.entry;
-		info->ioapic.entry = NULL;
-		memset(entry, 0, sizeof(*entry));
-		entry->index2	= (index >> 15) & 0x1;
-		entry->zero	= 0;
-		entry->format	= 1;
-		entry->index	= (index & 0x7fff);
 		/*
 		 * IO-APIC RTE will be configured with virtual vector.
 		 * irq handler will do the explicit EOI to the io-apic.
 		 */
-		entry->vector	= info->ioapic.pin;
-		entry->mask	= 0;			/* enable IRQ */
-		entry->trigger	= info->ioapic.trigger;
-		entry->polarity	= info->ioapic.polarity;
-		if (info->ioapic.trigger)
-			entry->mask = 1; /* Mask level triggered irqs. */
+		msg->data = info->ioapic.pin;
+		msg->address_hi = MSI_ADDR_BASE_HI;
+		msg->address_lo = MSI_ADDR_BASE_LO | MSI_ADDR_IR_EXT_INT |
+				  MSI_ADDR_IR_INDEX1(index) |
+				  MSI_ADDR_IR_INDEX2(index);
 		break;
 
 	case X86_IRQ_ALLOC_TYPE_HPET:
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-08 23:27               ` Thomas Gleixner
  2020-10-09  6:07                 ` David Woodhouse
@ 2020-10-10 10:06                 ` David Woodhouse
  2020-10-10 11:44                   ` Thomas Gleixner
  1 sibling, 1 reply; 58+ messages in thread
From: David Woodhouse @ 2020-10-10 10:06 UTC (permalink / raw)
  To: Thomas Gleixner, x86; +Cc: kvm, Paolo Bonzini, linux-kernel


[-- Attachment #1: Type: text/plain, Size: 1688 bytes --]

On Fri, 2020-10-09 at 01:27 +0200, Thomas Gleixner wrote:
> On Thu, Oct 08 2020 at 22:39, David Woodhouse wrote:
> > On Thu, 2020-10-08 at 23:14 +0200, Thomas Gleixner wrote:
> > > > 
> > > > (We'd want the x86_vector_domain to actually have an MSI compose
> > > > function in the !CONFIG_PCI_MSI case if we did this, of course.)
> > > 
> > > The compose function and the vector domain wrapper can simply move to
> > > vector.c
> > 
> > I ended up putting __irq_msi_compose_msg() into apic.c and that way I
> > can make virt_ext_dest_id static in that file.
> > 
> > And then I can move all the HPET-MSI support into hpet.c too.
> 
> Works for me.
> 
> > https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/ext_dest_id
> 
> For the next submission, can you please
> 
>  - pick up the -ENODEV changes for HPET/IOAPIC which I posted earlier

I think the world will be a nicer place if HPET and IOAPIC have their
own struct device and their drivers can just use dev_get_msi_domain().

The IRQ remapping drivers already plug into the device-add notifier and
can fill in the appropriate MSI domain just like they do¹ for PCI and
ACPI devices.

Using platform_add_bundle() for HPET looks trivial enough; I'll have a
play with that and then do IOAPIC too if/when the initialisation order
and hotplug handling all works out OK to install the correct
msi_domain.

-- 
dwmw2

¹ Yeah, I know they don't do it directly; it's done in 
  pcibios_add_device(). But maybe they should? Because yeah, I also 
  know they don't do it for ACPI devices either, because nothing does,
  and IRQ remapping on ANDD-listed devices doesn't work AFAICT.


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-10 10:06                 ` David Woodhouse
@ 2020-10-10 11:44                   ` Thomas Gleixner
  2020-10-10 11:58                     ` David Woodhouse
  0 siblings, 1 reply; 58+ messages in thread
From: Thomas Gleixner @ 2020-10-10 11:44 UTC (permalink / raw)
  To: David Woodhouse, x86; +Cc: kvm, Paolo Bonzini, linux-kernel

On Sat, Oct 10 2020 at 11:06, David Woodhouse wrote:
> On Fri, 2020-10-09 at 01:27 +0200, Thomas Gleixner wrote:
>> On Thu, Oct 08 2020 at 22:39, David Woodhouse wrote:
>> For the next submission, can you please
>> 
>>  - pick up the -ENODEV changes for HPET/IOAPIC which I posted earlier
>
> I think the world will be a nicer place if HPET and IOAPIC have their
> own struct device and their drivers can just use dev_get_msi_domain().
>
> The IRQ remapping drivers already plug into the device-add notifier and
> can fill in the appropriate MSI domain just like they do¹ for PCI and
> ACPI devices.
>
> Using platform_add_bundle() for HPET looks trivial enough; I'll have a
> play with that and then do IOAPIC too if/when the initialisation order
> and hotplug handling all works out OK to install the correct
> msi_domain.

Yes, I was wondering about that when I made PCI at least use that
mechanism, but had not had time to actually look at it.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-10 11:44                   ` Thomas Gleixner
@ 2020-10-10 11:58                     ` David Woodhouse
  2020-10-11 17:12                       ` Thomas Gleixner
  0 siblings, 1 reply; 58+ messages in thread
From: David Woodhouse @ 2020-10-10 11:58 UTC (permalink / raw)
  To: Thomas Gleixner, x86; +Cc: kvm, Paolo Bonzini, linux-kernel



On 10 October 2020 12:44:10 BST, Thomas Gleixner <tglx@linutronix.de> wrote:
>On Sat, Oct 10 2020 at 11:06, David Woodhouse wrote:
>> On Fri, 2020-10-09 at 01:27 +0200, Thomas Gleixner wrote:
>>> On Thu, Oct 08 2020 at 22:39, David Woodhouse wrote:
>>> For the next submission, can you please
>>> 
>>>  - pick up the -ENODEV changes for HPET/IOAPIC which I posted
>earlier
>>
>> I think the world will be a nicer place if HPET and IOAPIC have their
>> own struct device and their drivers can just use
>dev_get_msi_domain().
>>
>> The IRQ remapping drivers already plug into the device-add notifier
>and
>> can fill in the appropriate MSI domain just like they do¹ for PCI and
>> ACPI devices.
>>
>> Using platform_add_bundle() for HPET looks trivial enough; I'll have
>a
>> play with that and then do IOAPIC too if/when the initialisation
>order
>> and hotplug handling all works out OK to install the correct
>> msi_domain.
>
>Yes, I was wondering about that when I made PCI at least use that
>mechanism, but had not had time to actually look at it.

Yeah. There's some muttering to be done for HPET about whether it's *its* MSI domain or whether it's the parent domain. But I'll have a play. I think we'll be able to drop the whole irq_remapping_get_irq_domain() thing.

Either way, it's a separate cleanup and the 15-bit APIC ID series I posted yesterday should be fine as it is. 

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-10 11:58                     ` David Woodhouse
@ 2020-10-11 17:12                       ` Thomas Gleixner
  2020-10-11 21:15                         ` David Woodhouse
  0 siblings, 1 reply; 58+ messages in thread
From: Thomas Gleixner @ 2020-10-11 17:12 UTC (permalink / raw)
  To: David Woodhouse, x86; +Cc: kvm, Paolo Bonzini, linux-kernel

On Sat, Oct 10 2020 at 12:58, David Woodhouse wrote:
> On 10 October 2020 12:44:10 BST, Thomas Gleixner <tglx@linutronix.de> wrote:
>>On Sat, Oct 10 2020 at 11:06, David Woodhouse wrote:

>>> The IRQ remapping drivers already plug into the device-add notifier
>>> and can fill in the appropriate MSI domain just like they do¹ for
>>> PCI and ACPI devices.
>>> Using platform_add_bundle() for HPET looks trivial enough; I'll have
>>> a play with that and then do IOAPIC too if/when the initialisation
>>> order and hotplug handling all works out OK to install the correct
>>> msi_domain.
>>
>> Yes, I was wondering about that when I made PCI at least use that
>> mechanism, but had not had time to actually look at it.
>
> Yeah. There's some muttering to be done for HPET about whether it's
> *its* MSI domain or whether it's the parent domain. But I'll have a
> play. I think we'll be able to drop the whole
> irq_remapping_get_irq_domain() thing.

That would be really nice.

> Either way, it's a separate cleanup and the 15-bit APIC ID series I
> posted yesterday should be fine as it is.

I go over it in the next days once more and stick it into my devel tree
until rc1. Need to get some conflicts sorted with that Device MSI stuff.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-11 17:12                       ` Thomas Gleixner
@ 2020-10-11 21:15                         ` David Woodhouse
  2020-10-12  9:33                           ` Thomas Gleixner
  0 siblings, 1 reply; 58+ messages in thread
From: David Woodhouse @ 2020-10-11 21:15 UTC (permalink / raw)
  To: Thomas Gleixner, x86; +Cc: kvm, Paolo Bonzini, linux-kernel



On 11 October 2020 18:12:08 BST, Thomas Gleixner <tglx@linutronix.de> wrote:
>On Sat, Oct 10 2020 at 12:58, David Woodhouse wrote:
>> On 10 October 2020 12:44:10 BST, Thomas Gleixner <tglx@linutronix.de>
>wrote:
>>>On Sat, Oct 10 2020 at 11:06, David Woodhouse wrote:
>
>>>> The IRQ remapping drivers already plug into the device-add notifier
>>>> and can fill in the appropriate MSI domain just like they do¹ for
>>>> PCI and ACPI devices.
>>>> Using platform_add_bundle() for HPET looks trivial enough; I'll
>have
>>>> a play with that and then do IOAPIC too if/when the initialisation
>>>> order and hotplug handling all works out OK to install the correct
>>>> msi_domain.
>>>
>>> Yes, I was wondering about that when I made PCI at least use that
>>> mechanism, but had not had time to actually look at it.
>>
>> Yeah. There's some muttering to be done for HPET about whether it's
>> *its* MSI domain or whether it's the parent domain. But I'll have a
>> play. I think we'll be able to drop the whole
>> irq_remapping_get_irq_domain() thing.
>
>That would be really nice.

I can make it work for HPET if I fix up the point at which the IRQ remapping code registers a notifier on the platform bus. (At IRQ remap setup time is too early; when it registers the PCI bus notifier is too late.)

IOAPIC is harder though as the platform bus doesn't even exist that early. Maybe an early platform bus is possible but it would have to turn out particularly simple to do, or I'd need to find another use case too, to justify it. Will continue to play....

>> Either way, it's a separate cleanup and the 15-bit APIC ID series I
>> posted yesterday should be fine as it is.
>
>I go over it in the next days once more and stick it into my devel tree
>until rc1. Need to get some conflicts sorted with that Device MSI
>stuff.

While playing with HPET I noticed I need s/CONFIG_PCI_MSI/CONFIG_IRQ_GENERIC_MSI/ where the variables are declared at the top of msi.c to match the change I made later on. Can post v3 of the series or you can silently fix it up as you go; please advise.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-11 21:15                         ` David Woodhouse
@ 2020-10-12  9:33                           ` Thomas Gleixner
  2020-10-12 16:06                             ` David Woodhouse
  0 siblings, 1 reply; 58+ messages in thread
From: Thomas Gleixner @ 2020-10-12  9:33 UTC (permalink / raw)
  To: David Woodhouse, x86; +Cc: kvm, Paolo Bonzini, linux-kernel

On Sun, Oct 11 2020 at 22:15, David Woodhouse wrote:
> On 11 October 2020 18:12:08 BST, Thomas Gleixner <tglx@linutronix.de> wrote:
>> On Sat, Oct 10 2020 at 12:58, David Woodhouse wrote:
>>> On 10 October 2020 12:44:10 BST, Thomas Gleixner <tglx@linutronix.de>
>> wrote:
>>> Yeah. There's some muttering to be done for HPET about whether it's
>>> *its* MSI domain or whether it's the parent domain. But I'll have a
>>> play. I think we'll be able to drop the whole
>>> irq_remapping_get_irq_domain() thing.
>>
>> That would be really nice.
>
> I can make it work for HPET if I fix up the point at which the IRQ
> remapping code registers a notifier on the platform bus. (At IRQ remap
> setup time is too early; when it registers the PCI bus notifier is too
> late.)
>
> IOAPIC is harder though as the platform bus doesn't even exist that
> early. Maybe an early platform bus is possible but it would have to
> turn out particularly simple to do, or I'd need to find another use
> case too, to justify it. Will continue to play....

You might want to look into using irq_find_matching_fwspec() instead for
both HPET and IOAPIC. That needs a select() callback implemented in the
remapping domains.

>> I go over it in the next days once more and stick it into my devel tree
>> until rc1. Need to get some conflicts sorted with that Device MSI
>> stuff.
>
> While playing with HPET I noticed I need
> s/CONFIG_PCI_MSI/CONFIG_IRQ_GENERIC_MSI/ where the variables are
> declared at the top of msi.c to match the change I made later on. Can
> post v3 of the series or you can silently fix it up as you go; please
> advise.

I think I might be able to handle that on my own :)

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-12  9:33                           ` Thomas Gleixner
@ 2020-10-12 16:06                             ` David Woodhouse
  2020-10-12 18:38                               ` Thomas Gleixner
  0 siblings, 1 reply; 58+ messages in thread
From: David Woodhouse @ 2020-10-12 16:06 UTC (permalink / raw)
  To: Thomas Gleixner, x86, Marc Zyngier; +Cc: kvm, Paolo Bonzini, linux-kernel


[-- Attachment #1: Type: text/plain, Size: 17703 bytes --]

On Mon, 2020-10-12 at 11:33 +0200, Thomas Gleixner wrote:
> On Sun, Oct 11 2020 at 22:15, David Woodhouse wrote:
> > On 11 October 2020 18:12:08 BST, Thomas Gleixner <tglx@linutronix.de> wrote:
> > > On Sat, Oct 10 2020 at 12:58, David Woodhouse wrote:
> > > > On 10 October 2020 12:44:10 BST, Thomas Gleixner <tglx@linutronix.de>
> > > 
> > > wrote:
> > > > Yeah. There's some muttering to be done for HPET about whether it's
> > > > *its* MSI domain or whether it's the parent domain. But I'll have a
> > > > play. I think we'll be able to drop the whole
> > > > irq_remapping_get_irq_domain() thing.
> > > 
> > > That would be really nice.
> > 
> > I can make it work for HPET if I fix up the point at which the IRQ
> > remapping code registers a notifier on the platform bus. (At IRQ remap
> > setup time is too early; when it registers the PCI bus notifier is too
> > late.)
> > 
> > IOAPIC is harder though as the platform bus doesn't even exist that
> > early. Maybe an early platform bus is possible but it would have to
> > turn out particularly simple to do, or I'd need to find another use
> > case too, to justify it. Will continue to play....
> 
> You might want to look into using irq_find_matching_fwspec() instead for
> both HPET and IOAPIC. That needs a select() callback implemented in the
> remapping domains.

That works. Pushed (along with that trivial HPET MSI fixup) to 
https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/ext_dest_id

Briefly tested with I/OAPIC with and without Intel remapping in qemu,
not tested for HPET because I haven't worked out how to get qemu to do
HPET-MSI yet.

David Woodhouse (8):
      genirq/irqdomain: Implement get_name() method on irqchip fwnodes
      x86/apic: Add select() method on vector irqdomain
      iommu/amd: Implement select() method on remapping irqdomain
      iommu/vt-d: Implement select() method on remapping irqdomain
      iommu/hyper-v: Implement select() method on remapping irqdomain
      x86/hpet: Use irq_find_matching_fwspec() to find remapping irqdomain
      x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain
      x86: Kill all traces of irq_remapping_get_irq_domain()

 arch/x86/include/asm/hw_irq.h        |  2 --
 arch/x86/include/asm/irq_remapping.h |  9 ------
 arch/x86/include/asm/irqdomain.h     |  3 ++
 arch/x86/kernel/apic/io_apic.c       | 21 ++++++--------
 arch/x86/kernel/apic/vector.c        | 52 +++++++++++++++++++++++++++++++++++
 arch/x86/kernel/hpet.c               | 23 +++++++++-------
 drivers/iommu/amd/iommu.c            | 53 +++++++++++++-----------------------
 drivers/iommu/hyperv-iommu.c         | 18 ++++++------
 drivers/iommu/intel/irq_remapping.c  | 30 +++++++++-----------
 drivers/iommu/irq_remapping.c        | 14 ----------
 drivers/iommu/irq_remapping.h        |  3 --
 kernel/irq/irqdomain.c               | 11 +++++++-
 12 files changed, 128 insertions(+), 111 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index aabd8f1b6bb0..aef795f17478 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -40,8 +40,6 @@ enum irq_alloc_type {
 	X86_IRQ_ALLOC_TYPE_PCI_MSIX,
 	X86_IRQ_ALLOC_TYPE_DMAR,
 	X86_IRQ_ALLOC_TYPE_UV,
-	X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT,
-	X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT,
 };
 
 struct ioapic_alloc_info {
diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index af4a151d70b3..7cc49432187f 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -44,9 +44,6 @@ extern int irq_remapping_reenable(int);
 extern int irq_remap_enable_fault_handling(void);
 extern void panic_if_irq_remap(const char *msg);
 
-extern struct irq_domain *
-irq_remapping_get_irq_domain(struct irq_alloc_info *info);
-
 /* Create PCI MSI/MSIx irqdomain, use @parent as the parent irqdomain. */
 extern struct irq_domain *
 arch_create_remap_msi_irq_domain(struct irq_domain *par, const char *n, int id);
@@ -71,11 +68,5 @@ static inline void panic_if_irq_remap(const char *msg)
 {
 }
 
-static inline struct irq_domain *
-irq_remapping_get_irq_domain(struct irq_alloc_info *info)
-{
-	return NULL;
-}
-
 #endif /* CONFIG_IRQ_REMAP */
 #endif /* __X86_IRQ_REMAPPING_H */
diff --git a/arch/x86/include/asm/irqdomain.h b/arch/x86/include/asm/irqdomain.h
index cd684d45cb5f..2fc85f523ace 100644
--- a/arch/x86/include/asm/irqdomain.h
+++ b/arch/x86/include/asm/irqdomain.h
@@ -12,6 +12,9 @@ enum {
 	X86_IRQ_ALLOC_LEGACY				= 0x2,
 };
 
+int x86_fwspec_is_ioapic(struct irq_fwspec *fwspec);
+int x86_fwspec_is_hpet(struct irq_fwspec *fwspec);
+
 extern struct irq_domain *x86_vector_domain;
 
 extern void init_irq_alloc_info(struct irq_alloc_info *info,
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index ca2da19d5c55..f6a5b9f887aa 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2305,36 +2305,33 @@ static inline void __init check_timer(void)
 
 static int mp_irqdomain_create(int ioapic)
 {
-	struct irq_alloc_info info;
 	struct irq_domain *parent;
 	int hwirqs = mp_ioapic_pin_count(ioapic);
 	struct ioapic *ip = &ioapics[ioapic];
 	struct ioapic_domain_cfg *cfg = &ip->irqdomain_cfg;
 	struct mp_ioapic_gsi *gsi_cfg = mp_ioapic_gsi_routing(ioapic);
 	struct fwnode_handle *fn;
-	char *name = "IO-APIC";
+	struct irq_fwspec fwspec;
 
 	if (cfg->type == IOAPIC_DOMAIN_INVALID)
 		return 0;
 
-	init_irq_alloc_info(&info, NULL);
-	info.type = X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT;
-	info.devid = mpc_ioapic_id(ioapic);
-	parent = irq_remapping_get_irq_domain(&info);
-	if (!parent)
-		parent = x86_vector_domain;
-	else
-		name = "IO-APIC-IR";
-
 	/* Handle device tree enumerated APICs proper */
 	if (cfg->dev) {
 		fn = of_node_to_fwnode(cfg->dev);
 	} else {
-		fn = irq_domain_alloc_named_id_fwnode(name, ioapic);
+		fn = irq_domain_alloc_named_id_fwnode("IO-APIC", ioapic);
 		if (!fn)
 			return -ENOMEM;
 	}
 
+	fwspec.fwnode = fn;
+	fwspec.param_count = 1;
+	fwspec.param[0] = ioapic;
+	parent = irq_find_matching_fwspec(&fwspec, DOMAIN_BUS_ANY);
+	if (!parent)
+		return -ENODEV;
+
 	ip->irqdomain = irq_domain_create_linear(fn, hwirqs, cfg->ops,
 						 (void *)(long)ioapic);
 
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index bb2e2a2488a5..3f0485c12b13 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -636,7 +636,59 @@ static void x86_vector_debug_show(struct seq_file *m, struct irq_domain *d,
 }
 #endif
 
+
+int x86_fwspec_is_ioapic(struct irq_fwspec *fwspec)
+{
+	if (fwspec->param_count != 1)
+		return 0;
+
+	if (is_fwnode_irqchip(fwspec->fwnode)){
+		const char *fwname = fwnode_get_name(fwspec->fwnode);
+
+		if (!strncmp(fwname, "IO-APIC-", 8) &&
+		    simple_strtol(fwname+8, NULL, 10) == fwspec->param[0])
+			return 1;
+#ifdef CONFIG_OF
+	} else if (to_of_node(fwspec->fwnode) &&
+		   of_device_is_compatible(to_of_node(fwspec->fwnode),
+					   "intel,ce4100-ioapic")) {
+		return 1;
+#endif
+	}
+	return 0;
+}
+
+int x86_fwspec_is_hpet(struct irq_fwspec *fwspec)
+{
+	if (fwspec->param_count != 1)
+		return 0;
+
+	if (is_fwnode_irqchip(fwspec->fwnode)){
+		const char *fwname = fwnode_get_name(fwspec->fwnode);
+
+		if (!strncmp(fwname, "HPET-MSI-", 9) &&
+		    simple_strtol(fwname+9, NULL, 10) == fwspec->param[0])
+			return 1;
+	}
+	return 0;
+}
+
+static int x86_vector_select(struct irq_domain *d, struct irq_fwspec *fwspec,
+			     enum irq_domain_bus_token bus_token)
+{
+	/*
+	 * HPET and I/OAPIC cannot be parented in the vector domain
+	 * if IRQ remapping is enabled. APIC IDs above 15 bits are
+	 * only permitted if IRQ remapping is enabled, so check that.
+	 */
+	if (apic->apic_id_valid(32768))
+		return 0;
+
+	return x86_fwspec_is_ioapic(fwspec) || x86_fwspec_is_hpet(fwspec);
+}
+
 static const struct irq_domain_ops x86_vector_domain_ops = {
+	.select		= x86_vector_select,
 	.alloc		= x86_vector_alloc_irqs,
 	.free		= x86_vector_free_irqs,
 	.activate	= x86_vector_activate,
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 3b8b12769f3b..fb7736ca7b5b 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -543,8 +543,8 @@ static struct irq_domain *hpet_create_irq_domain(int hpet_id)
 {
 	struct msi_domain_info *domain_info;
 	struct irq_domain *parent, *d;
-	struct irq_alloc_info info;
 	struct fwnode_handle *fn;
+	struct irq_fwspec fwspec;
 
 	if (x86_vector_domain == NULL)
 		return NULL;
@@ -556,15 +556,6 @@ static struct irq_domain *hpet_create_irq_domain(int hpet_id)
 	*domain_info = hpet_msi_domain_info;
 	domain_info->data = (void *)(long)hpet_id;
 
-	init_irq_alloc_info(&info, NULL);
-	info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
-	info.devid = hpet_id;
-	parent = irq_remapping_get_irq_domain(&info);
-	if (parent == NULL)
-		parent = x86_vector_domain;
-	else
-		hpet_msi_controller.name = "IR-HPET-MSI";
-
 	fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
 					      hpet_id);
 	if (!fn) {
@@ -572,6 +563,18 @@ static struct irq_domain *hpet_create_irq_domain(int hpet_id)
 		return NULL;
 	}
 
+	fwspec.fwnode = fn;
+	fwspec.param_count = 1;
+	fwspec.param[0] = hpet_id;
+	parent = irq_find_matching_fwspec(&fwspec, DOMAIN_BUS_ANY);
+	if (!parent) {
+		irq_domain_free_fwnode(fn);
+		kfree(domain_info);
+		return NULL;
+	}
+	if (parent != x86_vector_domain)
+		hpet_msi_controller.name = "IR-HPET-MSI";
+
 	d = msi_create_irq_domain(fn, domain_info, parent);
 	if (!d) {
 		irq_domain_free_fwnode(fn);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 13d0a8f42d56..16adbf9d8fbb 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3536,10 +3536,8 @@ static int get_devid(struct irq_alloc_info *info)
 {
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC:
-	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
 		return get_ioapic_devid(info->devid);
 	case X86_IRQ_ALLOC_TYPE_HPET:
-	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
 		return get_hpet_devid(info->devid);
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
@@ -3550,46 +3548,32 @@ static int get_devid(struct irq_alloc_info *info)
 	}
 }
 
-static struct irq_domain *get_irq_domain_for_devid(struct irq_alloc_info *info,
-						   int devid)
-{
-	struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
-
-	if (!iommu)
-		return NULL;
-
-	switch (info->type) {
-	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
-	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
-		return iommu->ir_domain;
-	default:
-		WARN_ON_ONCE(1);
-		return NULL;
-	}
-}
-
-static struct irq_domain *get_irq_domain(struct irq_alloc_info *info)
-{
-	int devid;
-
-	if (!info)
-		return NULL;
-
-	devid = get_devid(info);
-	if (devid < 0)
-		return NULL;
-	return get_irq_domain_for_devid(info, devid);
-}
-
 struct irq_remap_ops amd_iommu_irq_ops = {
 	.prepare		= amd_iommu_prepare,
 	.enable			= amd_iommu_enable,
 	.disable		= amd_iommu_disable,
 	.reenable		= amd_iommu_reenable,
 	.enable_faulting	= amd_iommu_enable_faulting,
-	.get_irq_domain		= get_irq_domain,
 };
 
+static int irq_remapping_select(struct irq_domain *d, struct irq_fwspec *fwspec,
+				enum irq_domain_bus_token bus_token)
+{
+	struct amd_iommu *iommu;
+	int devid = -1;
+
+	if (x86_fwspec_is_ioapic(fwspec))
+		devid = get_ioapic_devid(fwspec->param[0]);
+	else if (x86_fwspec_is_ioapic(fwspec))
+		devid = get_hpet_devid(fwspec->param[0]);
+
+	if (devid < 0)
+		return 0;
+
+	iommu = amd_iommu_rlookup_table[devid];
+	return iommu && iommu->ir_domain == d;
+}
+
 static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 				       struct irq_cfg *irq_cfg,
 				       struct irq_alloc_info *info,
@@ -3813,6 +3797,7 @@ static void irq_remapping_deactivate(struct irq_domain *domain,
 }
 
 static const struct irq_domain_ops amd_ir_domain_ops = {
+	.select = irq_remapping_select,
 	.alloc = irq_remapping_alloc,
 	.free = irq_remapping_free,
 	.activate = irq_remapping_activate,
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 37dd485a5640..e7ed2bb83358 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -61,6 +61,14 @@ static struct irq_chip hyperv_ir_chip = {
 	.irq_set_affinity	= hyperv_ir_set_affinity,
 };
 
+static int hyperv_irq_remapping_select(struct irq_domain *d,
+				       struct irq_fwspec *fwspec,
+				       enum irq_domain_bus_token bus_token)
+{
+	/* Claim only the first (and only) I/OAPIC */
+	return x86_fwspec_is_ioapic(fwspec) && fwspec->param[0] == 0;
+}
+
 static int hyperv_irq_remapping_alloc(struct irq_domain *domain,
 				     unsigned int virq, unsigned int nr_irqs,
 				     void *arg)
@@ -102,6 +110,7 @@ static void hyperv_irq_remapping_free(struct irq_domain *domain,
 }
 
 static const struct irq_domain_ops hyperv_ir_domain_ops = {
+	.select = hyperv_irq_remapping_select,
 	.alloc = hyperv_irq_remapping_alloc,
 	.free = hyperv_irq_remapping_free,
 };
@@ -151,18 +160,9 @@ static int __init hyperv_enable_irq_remapping(void)
 	return IRQ_REMAP_X2APIC_MODE;
 }
 
-static struct irq_domain *hyperv_get_irq_domain(struct irq_alloc_info *info)
-{
-	if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT)
-		return ioapic_ir_domain;
-	else
-		return NULL;
-}
-
 struct irq_remap_ops hyperv_irq_remap_ops = {
 	.prepare		= hyperv_prepare_irq_remapping,
 	.enable			= hyperv_enable_irq_remapping,
-	.get_irq_domain		= hyperv_get_irq_domain,
 };
 
 #endif
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 511dfb4884bc..ccf61cd18f69 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1128,29 +1128,12 @@ static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
 	irte->redir_hint = 1;
 }
 
-static struct irq_domain *intel_get_irq_domain(struct irq_alloc_info *info)
-{
-	if (!info)
-		return NULL;
-
-	switch (info->type) {
-	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
-		return map_ioapic_to_ir(info->devid);
-	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
-		return map_hpet_to_ir(info->devid);
-	default:
-		WARN_ON_ONCE(1);
-		return NULL;
-	}
-}
-
 struct irq_remap_ops intel_irq_remap_ops = {
 	.prepare		= intel_prepare_irq_remapping,
 	.enable			= intel_enable_irq_remapping,
 	.disable		= disable_irq_remapping,
 	.reenable		= reenable_irq_remapping,
 	.enable_faulting	= enable_drhd_fault_handling,
-	.get_irq_domain		= intel_get_irq_domain,
 };
 
 static void intel_ir_reconfigure_irte(struct irq_data *irqd, bool force)
@@ -1435,7 +1418,20 @@ static void intel_irq_remapping_deactivate(struct irq_domain *domain,
 	modify_irte(&data->irq_2_iommu, &entry);
 }
 
+static int intel_irq_remapping_select(struct irq_domain *d,
+				      struct irq_fwspec *fwspec,
+				      enum irq_domain_bus_token bus_token)
+{
+	if (x86_fwspec_is_ioapic(fwspec))
+		return d == map_ioapic_to_ir(fwspec->param[0]);
+	else if (x86_fwspec_is_hpet(fwspec))
+		return d == map_hpet_to_ir(fwspec->param[0]);
+
+	return 0;
+}
+
 static const struct irq_domain_ops intel_ir_domain_ops = {
+	.select = intel_irq_remapping_select,
 	.alloc = intel_irq_remapping_alloc,
 	.free = intel_irq_remapping_free,
 	.activate = intel_irq_remapping_activate,
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index 2d84b1ed205e..83314b9d8f38 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -158,17 +158,3 @@ void panic_if_irq_remap(const char *msg)
 	if (irq_remapping_enabled)
 		panic(msg);
 }
-
-/**
- * irq_remapping_get_irq_domain - Get the irqdomain serving the request @info
- * @info: interrupt allocation information, used to identify the IOMMU device
- *
- * Returns pointer to IRQ domain, or NULL on failure.
- */
-struct irq_domain *irq_remapping_get_irq_domain(struct irq_alloc_info *info)
-{
-	if (!remap_ops || !remap_ops->get_irq_domain)
-		return NULL;
-
-	return remap_ops->get_irq_domain(info);
-}
diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
index 1661b3d75920..8c89cb947cdb 100644
--- a/drivers/iommu/irq_remapping.h
+++ b/drivers/iommu/irq_remapping.h
@@ -42,9 +42,6 @@ struct irq_remap_ops {
 
 	/* Enable fault handling */
 	int  (*enable_faulting)(void);
-
-	/* Get the irqdomain associated to IOMMU device */
-	struct irq_domain *(*get_irq_domain)(struct irq_alloc_info *);
 };
 
 extern struct irq_remap_ops intel_irq_remap_ops;
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 76cd7ebd1178..6440f97c412e 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -42,7 +42,16 @@ static inline void debugfs_add_domain_dir(struct irq_domain *d) { }
 static inline void debugfs_remove_domain_dir(struct irq_domain *d) { }
 #endif
 
-const struct fwnode_operations irqchip_fwnode_ops;
+static const char *irqchip_fwnode_get_name(const struct fwnode_handle *fwnode)
+{
+	struct irqchip_fwid *fwid = container_of(fwnode, struct irqchip_fwid, fwnode);
+
+	return fwid->name;
+}
+
+const struct fwnode_operations irqchip_fwnode_ops = {
+	.get_name = irqchip_fwnode_get_name,
+};
 EXPORT_SYMBOL_GPL(irqchip_fwnode_ops);
 
 /**

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-12 16:06                             ` David Woodhouse
@ 2020-10-12 18:38                               ` Thomas Gleixner
  2020-10-12 20:20                                 ` David Woodhouse
  0 siblings, 1 reply; 58+ messages in thread
From: Thomas Gleixner @ 2020-10-12 18:38 UTC (permalink / raw)
  To: David Woodhouse, x86, Marc Zyngier; +Cc: kvm, Paolo Bonzini, linux-kernel

On Mon, Oct 12 2020 at 17:06, David Woodhouse wrote:
> On Mon, 2020-10-12 at 11:33 +0200, Thomas Gleixner wrote:
>> You might want to look into using irq_find_matching_fwspec() instead for
>> both HPET and IOAPIC. That needs a select() callback implemented in the
>> remapping domains.
>
> That works.

:)

Nasty, but way better than what we have now. Now I start to wonder
whether we can get rid of a few other things as well especially the
non-standard alloc_info thingy.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-12 18:38                               ` Thomas Gleixner
@ 2020-10-12 20:20                                 ` David Woodhouse
  2020-10-12 22:13                                   ` Thomas Gleixner
  0 siblings, 1 reply; 58+ messages in thread
From: David Woodhouse @ 2020-10-12 20:20 UTC (permalink / raw)
  To: Thomas Gleixner, x86, Marc Zyngier; +Cc: kvm, Paolo Bonzini, linux-kernel


[-- Attachment #1: Type: text/plain, Size: 788 bytes --]

On Mon, 2020-10-12 at 20:38 +0200, Thomas Gleixner wrote:
> On Mon, Oct 12 2020 at 17:06, David Woodhouse wrote:
> > On Mon, 2020-10-12 at 11:33 +0200, Thomas Gleixner wrote:
> > > You might want to look into using irq_find_matching_fwspec()
> > > instead for
> > > both HPET and IOAPIC. That needs a select() callback implemented
> > > in the
> > > remapping domains.
> > 
> > That works.
> 
> :)
> 
> Nasty, but way better than what we have now. 

Want me to send that out in email or is the git tree enough for now?

I've cleaned it up a little and fixed a bug in the I/OAPIC error path.

Still not entirely convinced about the apic->apic_id_valid(32768) thing
but it should work well enough, and doesn't require exporting any extra
state from apic.c that way.


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-12 20:20                                 ` David Woodhouse
@ 2020-10-12 22:13                                   ` Thomas Gleixner
  2020-10-13  7:52                                     ` David Woodhouse
  0 siblings, 1 reply; 58+ messages in thread
From: Thomas Gleixner @ 2020-10-12 22:13 UTC (permalink / raw)
  To: David Woodhouse, x86, Marc Zyngier; +Cc: kvm, Paolo Bonzini, linux-kernel

On Mon, Oct 12 2020 at 21:20, David Woodhouse wrote:
> On Mon, 2020-10-12 at 20:38 +0200, Thomas Gleixner wrote:
>> Nasty, but way better than what we have now. 
>
> Want me to send that out in email or is the git tree enough for now?
>
> I've cleaned it up a little and fixed a bug in the I/OAPIC error path.

Mail would be nice once you are confident with the pile.

> Still not entirely convinced about the apic->apic_id_valid(32768) thing
> but it should work well enough, and doesn't require exporting any extra
> state from apic.c that way.

Yeah, that part is odd.

I really dislike the way how irq_find_matching_fwspec() works. The 'rc'
value is actually boolean despite being type 'int' and if 'rc' is not 0
then it returns the domain even if 'rc' is an error code.

But that does not allow to return error codes from a domain match() /
select() callback which is what we really want to express that there is
something fishy.

Something like the below perhaps? Needs more thought obviously.

Marc, any opinion ?

Thanks,

        tglx

8<-------------

arch/x86/kernel/apic/vector.c |   18 ++++++++++++++++++
 include/linux/irqdomain.h     |    1 +
 kernel/irq/irqdomain.c        |    2 +-
 3 files changed, 20 insertions(+), 1 deletion(-)

--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -593,6 +593,24 @@ static int x86_vector_alloc_irqs(struct
 	return err;
 }
 
+struct irq_domain *x86_select_parent_domain(struct irq_fwspec *fwspec)
+{
+	struct irq_domain *dom;
+
+	/*
+	 * If Interrupt Remapping is enabled then the match function
+	 * returns either the remapping domain or an error code if the
+	 * device is not registered with the remapping unit.
+	 *
+	 * If remapping is not enabled, then the function returns NULL.
+	 */
+	dom = irq_find_matching_fwspec(fwspec, DOMAIN_BUS_IR);
+	if (dom)
+		return IS_ERR(dom) ? NULL : dom;
+
+	return x86_vector_domain;
+}
+
 #ifdef CONFIG_GENERIC_IRQ_DEBUGFS
 static void x86_vector_debug_show(struct seq_file *m, struct irq_domain *d,
 				  struct irq_data *irqd, int ind)
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -85,6 +85,7 @@ enum irq_domain_bus_token {
 	DOMAIN_BUS_TI_SCI_INTA_MSI,
 	DOMAIN_BUS_WAKEUP,
 	DOMAIN_BUS_VMD_MSI,
+	DOMAIN_BUS_IR,
 };
 
 /**
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -395,7 +395,7 @@ struct irq_domain *irq_find_matching_fws
 			       (h->bus_token == bus_token)));
 
 		if (rc) {
-			found = h;
+			found = rc < 0 ? ERR_PTR(rc) : h;
 			break;
 		}
 	}




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-12 22:13                                   ` Thomas Gleixner
@ 2020-10-13  7:52                                     ` David Woodhouse
  2020-10-13  8:11                                       ` [PATCH 0/9] Remove irq_remapping_get_irq_domain() David Woodhouse
  2020-10-13  9:28                                       ` [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID Thomas Gleixner
  0 siblings, 2 replies; 58+ messages in thread
From: David Woodhouse @ 2020-10-13  7:52 UTC (permalink / raw)
  To: Thomas Gleixner, x86, Marc Zyngier; +Cc: kvm, Paolo Bonzini, linux-kernel


[-- Attachment #1: Type: text/plain, Size: 2046 bytes --]

On Tue, 2020-10-13 at 00:13 +0200, Thomas Gleixner wrote:
> On Mon, Oct 12 2020 at 21:20, David Woodhouse wrote:
> > On Mon, 2020-10-12 at 20:38 +0200, Thomas Gleixner wrote:
> > > Nasty, but way better than what we have now. 
> > 
> > Want me to send that out in email or is the git tree enough for now?
> > 
> > I've cleaned it up a little and fixed a bug in the I/OAPIC error path.
> 
> Mail would be nice once you are confident with the pile.

After the next cup of coffee. Will send it as an incremental series on
top of my previous ext_dest_id series.

> > Still not entirely convinced about the apic->apic_id_valid(32768) thing
> > but it should work well enough, and doesn't require exporting any extra
> > state from apic.c that way.
> 
> Yeah, that part is odd.
> 
> I really dislike the way how irq_find_matching_fwspec() works. The 'rc'
> value is actually boolean despite being type 'int' and if 'rc' is not 0
> then it returns the domain even if 'rc' is an error code.
> 
> But that does not allow to return error codes from a domain match() /
> select() callback which is what we really want to express that there is
> something fishy.

I don't know that we need to return an explicit error. I made
x86_vector_select() refuse to match if IRQ remapping is enabled, which
means that the irq_find_matching_fwspec() call will fail by returning
NULL in precisely the cases it should. (Which should never happen; qv).

That failure means HPET will refuse to set up MSI, or I/OAPIC will
refuse to initialise (later causing a BUG and a failure to boot, which
is probably the correct thing to do).

It's all fine.

+       dom = irq_find_matching_fwspec(fwspec, DOMAIN_BUS_IR);
+       if (dom)
+               return IS_ERR(dom) ? NULL : dom;
+
+       return x86_vector_domain;
+}

Ick. There's no need for that.

Eliminating that awful "if not found then slip the x86_vector_domain in
as a special case" was the whole *point* of using
irq_find_matching_fwspec() in the first place.



[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 0/9] Remove irq_remapping_get_irq_domain()
  2020-10-13  7:52                                     ` David Woodhouse
@ 2020-10-13  8:11                                       ` David Woodhouse
  2020-10-13  8:11                                         ` [PATCH 1/9] genirq/irqdomain: Implement get_name() method on irqchip fwnodes David Woodhouse
                                                           ` (8 more replies)
  2020-10-13  9:28                                       ` [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID Thomas Gleixner
  1 sibling, 9 replies; 58+ messages in thread
From: David Woodhouse @ 2020-10-13  8:11 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz

I didn't much like the I/OAPIC and HPET drivers having magical knowledge
that they had to substitute x86_vector_domain if their call to
irq_remapping_get_irq_domain() returned NULL.

When Thomas tried to make it handle error returns from …get_irq_domain() 
distinctly from the NULL case too, it made me even sadder. So I killed 
it with fire.

Now they just use irq_find_matching_fwspec() to find an appropriate
irqdomain. Each remapping irqdomain just needs to say 'yep, that's me'
for the HPETs or I/OAPICs which are within their scope, while the
x86_vector_domain accepts them all but only if interrupt remapping
is *disabled*. No more special knowledge in the caller.

If IR is enabled and there's a child device which escapes the scope of
all remapping units, it gets NULL for its parent irqdomain and will
fail to initialise, which is the correct thing to do in that "should
never happen" case. For HPET that'll mean that it just doesn't support
MSI, while I/OAPIC will refuse to initialise and trigger a BUG_ON
because Linux quite likes it when *all* the I/OAPICs it knows about get
initialised successfully.

This is on top of the previous 'ext_dest_id' series at
https://patchwork.kernel.org/project/kvm/list/?series=362037

https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/ext_dest_id

David Woodhouse (9):
      genirq/irqdomain: Implement get_name() method on irqchip fwnodes
      x86/apic: Add select() method on vector irqdomain
      iommu/amd: Implement select() method on remapping irqdomain
      iommu/vt-d: Implement select() method on remapping irqdomain
      iommu/hyper-v: Implement select() method on remapping irqdomain
      x86/hpet: Use irq_find_matching_fwspec() to find remapping irqdomain
      x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain
      x86: Kill all traces of irq_remapping_get_irq_domain()
      iommu/vt-d: Simplify intel_irq_remapping_select()

 arch/x86/include/asm/hw_irq.h        |  2 --
 arch/x86/include/asm/irq_remapping.h |  9 ---------
 arch/x86/include/asm/irqdomain.h     |  3 +++
 arch/x86/kernel/apic/io_apic.c       | 24 ++++++++++++------------
 arch/x86/kernel/apic/vector.c        | 43 +++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/hpet.c               | 23 +++++++++++++----------
 drivers/iommu/amd/iommu.c            | 53 +++++++++++++++++++----------------------------------
 drivers/iommu/hyperv-iommu.c         | 18 +++++++++---------
 drivers/iommu/intel/irq_remapping.c  | 43 +++++++++++++++++--------------------------
 drivers/iommu/irq_remapping.c        | 14 --------------
 drivers/iommu/irq_remapping.h        |  3 ---
 kernel/irq/irqdomain.c               | 11 ++++++++++-
 12 files changed, 126 insertions(+), 120 deletions(-)




^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 1/9] genirq/irqdomain: Implement get_name() method on irqchip fwnodes
  2020-10-13  8:11                                       ` [PATCH 0/9] Remove irq_remapping_get_irq_domain() David Woodhouse
@ 2020-10-13  8:11                                         ` David Woodhouse
  2020-10-13  8:11                                         ` [PATCH 2/9] x86/apic: Add select() method on vector irqdomain David Woodhouse
                                                           ` (7 subsequent siblings)
  8 siblings, 0 replies; 58+ messages in thread
From: David Woodhouse @ 2020-10-13  8:11 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 kernel/irq/irqdomain.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 76cd7ebd1178..6440f97c412e 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -42,7 +42,16 @@ static inline void debugfs_add_domain_dir(struct irq_domain *d) { }
 static inline void debugfs_remove_domain_dir(struct irq_domain *d) { }
 #endif
 
-const struct fwnode_operations irqchip_fwnode_ops;
+static const char *irqchip_fwnode_get_name(const struct fwnode_handle *fwnode)
+{
+	struct irqchip_fwid *fwid = container_of(fwnode, struct irqchip_fwid, fwnode);
+
+	return fwid->name;
+}
+
+const struct fwnode_operations irqchip_fwnode_ops = {
+	.get_name = irqchip_fwnode_get_name,
+};
 EXPORT_SYMBOL_GPL(irqchip_fwnode_ops);
 
 /**
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 2/9] x86/apic: Add select() method on vector irqdomain
  2020-10-13  8:11                                       ` [PATCH 0/9] Remove irq_remapping_get_irq_domain() David Woodhouse
  2020-10-13  8:11                                         ` [PATCH 1/9] genirq/irqdomain: Implement get_name() method on irqchip fwnodes David Woodhouse
@ 2020-10-13  8:11                                         ` David Woodhouse
  2020-10-13  8:11                                         ` [PATCH 3/9] iommu/amd: Implement select() method on remapping irqdomain David Woodhouse
                                                           ` (6 subsequent siblings)
  8 siblings, 0 replies; 58+ messages in thread
From: David Woodhouse @ 2020-10-13  8:11 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz

From: David Woodhouse <dwmw@amazon.co.uk>

This will be used to select the irqdomain for I/OAPIC and HPET.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/irqdomain.h |  3 +++
 arch/x86/kernel/apic/vector.c    | 43 ++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/arch/x86/include/asm/irqdomain.h b/arch/x86/include/asm/irqdomain.h
index cd684d45cb5f..125c23b7bad3 100644
--- a/arch/x86/include/asm/irqdomain.h
+++ b/arch/x86/include/asm/irqdomain.h
@@ -12,6 +12,9 @@ enum {
 	X86_IRQ_ALLOC_LEGACY				= 0x2,
 };
 
+extern int x86_fwspec_is_ioapic(struct irq_fwspec *fwspec);
+extern int x86_fwspec_is_hpet(struct irq_fwspec *fwspec);
+
 extern struct irq_domain *x86_vector_domain;
 
 extern void init_irq_alloc_info(struct irq_alloc_info *info,
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index bb2e2a2488a5..b9b05caa28a4 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -636,7 +636,50 @@ static void x86_vector_debug_show(struct seq_file *m, struct irq_domain *d,
 }
 #endif
 
+int x86_fwspec_is_ioapic(struct irq_fwspec *fwspec)
+{
+	if (fwspec->param_count != 1)
+		return 0;
+
+	if (is_fwnode_irqchip(fwspec->fwnode)) {
+		const char *fwname = fwnode_get_name(fwspec->fwnode);
+		return fwname && !strncmp(fwname, "IO-APIC-", 8) &&
+			simple_strtol(fwname+8, NULL, 10) == fwspec->param[0];
+	}
+	return to_of_node(fwspec->fwnode) &&
+		of_device_is_compatible(to_of_node(fwspec->fwnode),
+					"intel,ce4100-ioapic");
+}
+
+int x86_fwspec_is_hpet(struct irq_fwspec *fwspec)
+{
+	if (fwspec->param_count != 1)
+		return 0;
+
+	if (is_fwnode_irqchip(fwspec->fwnode)) {
+		const char *fwname = fwnode_get_name(fwspec->fwnode);
+		return fwname && !strncmp(fwname, "HPET-MSI-", 9) &&
+			simple_strtol(fwname+9, NULL, 10) == fwspec->param[0];
+	}
+	return 0;
+}
+
+static int x86_vector_select(struct irq_domain *d, struct irq_fwspec *fwspec,
+			     enum irq_domain_bus_token bus_token)
+{
+	/*
+	 * HPET and I/OAPIC cannot be parented in the vector domain
+	 * if IRQ remapping is enabled. APIC IDs above 15 bits are
+	 * only permitted if IRQ remapping is enabled, so check that.
+	 */
+	if (apic->apic_id_valid(32768))
+		return 0;
+
+	return x86_fwspec_is_ioapic(fwspec) || x86_fwspec_is_hpet(fwspec);
+}
+
 static const struct irq_domain_ops x86_vector_domain_ops = {
+	.select		= x86_vector_select,
 	.alloc		= x86_vector_alloc_irqs,
 	.free		= x86_vector_free_irqs,
 	.activate	= x86_vector_activate,
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 3/9] iommu/amd: Implement select() method on remapping irqdomain
  2020-10-13  8:11                                       ` [PATCH 0/9] Remove irq_remapping_get_irq_domain() David Woodhouse
  2020-10-13  8:11                                         ` [PATCH 1/9] genirq/irqdomain: Implement get_name() method on irqchip fwnodes David Woodhouse
  2020-10-13  8:11                                         ` [PATCH 2/9] x86/apic: Add select() method on vector irqdomain David Woodhouse
@ 2020-10-13  8:11                                         ` David Woodhouse
  2020-10-13  8:11                                         ` [PATCH 4/9] iommu/vt-d: " David Woodhouse
                                                           ` (5 subsequent siblings)
  8 siblings, 0 replies; 58+ messages in thread
From: David Woodhouse @ 2020-10-13  8:11 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 drivers/iommu/amd/iommu.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 13d0a8f42d56..7ecebc5d255f 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3590,6 +3590,24 @@ struct irq_remap_ops amd_iommu_irq_ops = {
 	.get_irq_domain		= get_irq_domain,
 };
 
+static int irq_remapping_select(struct irq_domain *d, struct irq_fwspec *fwspec,
+				enum irq_domain_bus_token bus_token)
+{
+	struct amd_iommu *iommu;
+	int devid = -1;
+
+	if (x86_fwspec_is_ioapic(fwspec))
+		devid = get_ioapic_devid(fwspec->param[0]);
+	else if (x86_fwspec_is_ioapic(fwspec))
+		devid = get_hpet_devid(fwspec->param[0]);
+
+	if (devid < 0)
+		return 0;
+
+	iommu = amd_iommu_rlookup_table[devid];
+	return iommu && iommu->ir_domain == d;
+}
+
 static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 				       struct irq_cfg *irq_cfg,
 				       struct irq_alloc_info *info,
@@ -3813,6 +3831,7 @@ static void irq_remapping_deactivate(struct irq_domain *domain,
 }
 
 static const struct irq_domain_ops amd_ir_domain_ops = {
+	.select = irq_remapping_select,
 	.alloc = irq_remapping_alloc,
 	.free = irq_remapping_free,
 	.activate = irq_remapping_activate,
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 4/9] iommu/vt-d: Implement select() method on remapping irqdomain
  2020-10-13  8:11                                       ` [PATCH 0/9] Remove irq_remapping_get_irq_domain() David Woodhouse
                                                           ` (2 preceding siblings ...)
  2020-10-13  8:11                                         ` [PATCH 3/9] iommu/amd: Implement select() method on remapping irqdomain David Woodhouse
@ 2020-10-13  8:11                                         ` David Woodhouse
  2020-10-13  8:11                                         ` [PATCH 5/9] iommu/hyper-v: " David Woodhouse
                                                           ` (4 subsequent siblings)
  8 siblings, 0 replies; 58+ messages in thread
From: David Woodhouse @ 2020-10-13  8:11 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 drivers/iommu/intel/irq_remapping.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 511dfb4884bc..40c2fec122b8 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1435,7 +1435,20 @@ static void intel_irq_remapping_deactivate(struct irq_domain *domain,
 	modify_irte(&data->irq_2_iommu, &entry);
 }
 
+static int intel_irq_remapping_select(struct irq_domain *d,
+				      struct irq_fwspec *fwspec,
+				      enum irq_domain_bus_token bus_token)
+{
+	if (x86_fwspec_is_ioapic(fwspec))
+		return d == map_ioapic_to_ir(fwspec->param[0]);
+	else if (x86_fwspec_is_hpet(fwspec))
+		return d == map_hpet_to_ir(fwspec->param[0]);
+
+	return 0;
+}
+
 static const struct irq_domain_ops intel_ir_domain_ops = {
+	.select = intel_irq_remapping_select,
 	.alloc = intel_irq_remapping_alloc,
 	.free = intel_irq_remapping_free,
 	.activate = intel_irq_remapping_activate,
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 5/9] iommu/hyper-v: Implement select() method on remapping irqdomain
  2020-10-13  8:11                                       ` [PATCH 0/9] Remove irq_remapping_get_irq_domain() David Woodhouse
                                                           ` (3 preceding siblings ...)
  2020-10-13  8:11                                         ` [PATCH 4/9] iommu/vt-d: " David Woodhouse
@ 2020-10-13  8:11                                         ` David Woodhouse
  2020-10-13  8:11                                         ` [PATCH 6/9] x86/hpet: Use irq_find_matching_fwspec() to find " David Woodhouse
                                                           ` (3 subsequent siblings)
  8 siblings, 0 replies; 58+ messages in thread
From: David Woodhouse @ 2020-10-13  8:11 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 drivers/iommu/hyperv-iommu.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 37dd485a5640..6a8966fbc3bd 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -61,6 +61,14 @@ static struct irq_chip hyperv_ir_chip = {
 	.irq_set_affinity	= hyperv_ir_set_affinity,
 };
 
+static int hyperv_irq_remapping_select(struct irq_domain *d,
+				       struct irq_fwspec *fwspec,
+				       enum irq_domain_bus_token bus_token)
+{
+	/* Claim only the first (and only) I/OAPIC */
+	return x86_fwspec_is_ioapic(fwspec) && fwspec->param[0] == 0;
+}
+
 static int hyperv_irq_remapping_alloc(struct irq_domain *domain,
 				     unsigned int virq, unsigned int nr_irqs,
 				     void *arg)
@@ -102,6 +110,7 @@ static void hyperv_irq_remapping_free(struct irq_domain *domain,
 }
 
 static const struct irq_domain_ops hyperv_ir_domain_ops = {
+	.select = hyperv_irq_remapping_select,
 	.alloc = hyperv_irq_remapping_alloc,
 	.free = hyperv_irq_remapping_free,
 };
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 6/9] x86/hpet: Use irq_find_matching_fwspec() to find remapping irqdomain
  2020-10-13  8:11                                       ` [PATCH 0/9] Remove irq_remapping_get_irq_domain() David Woodhouse
                                                           ` (4 preceding siblings ...)
  2020-10-13  8:11                                         ` [PATCH 5/9] iommu/hyper-v: " David Woodhouse
@ 2020-10-13  8:11                                         ` David Woodhouse
  2020-10-13  8:11                                         ` [PATCH 7/9] x86/ioapic: " David Woodhouse
                                                           ` (2 subsequent siblings)
  8 siblings, 0 replies; 58+ messages in thread
From: David Woodhouse @ 2020-10-13  8:11 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kernel/hpet.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 3b8b12769f3b..fb7736ca7b5b 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -543,8 +543,8 @@ static struct irq_domain *hpet_create_irq_domain(int hpet_id)
 {
 	struct msi_domain_info *domain_info;
 	struct irq_domain *parent, *d;
-	struct irq_alloc_info info;
 	struct fwnode_handle *fn;
+	struct irq_fwspec fwspec;
 
 	if (x86_vector_domain == NULL)
 		return NULL;
@@ -556,15 +556,6 @@ static struct irq_domain *hpet_create_irq_domain(int hpet_id)
 	*domain_info = hpet_msi_domain_info;
 	domain_info->data = (void *)(long)hpet_id;
 
-	init_irq_alloc_info(&info, NULL);
-	info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
-	info.devid = hpet_id;
-	parent = irq_remapping_get_irq_domain(&info);
-	if (parent == NULL)
-		parent = x86_vector_domain;
-	else
-		hpet_msi_controller.name = "IR-HPET-MSI";
-
 	fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
 					      hpet_id);
 	if (!fn) {
@@ -572,6 +563,18 @@ static struct irq_domain *hpet_create_irq_domain(int hpet_id)
 		return NULL;
 	}
 
+	fwspec.fwnode = fn;
+	fwspec.param_count = 1;
+	fwspec.param[0] = hpet_id;
+	parent = irq_find_matching_fwspec(&fwspec, DOMAIN_BUS_ANY);
+	if (!parent) {
+		irq_domain_free_fwnode(fn);
+		kfree(domain_info);
+		return NULL;
+	}
+	if (parent != x86_vector_domain)
+		hpet_msi_controller.name = "IR-HPET-MSI";
+
 	d = msi_create_irq_domain(fn, domain_info, parent);
 	if (!d) {
 		irq_domain_free_fwnode(fn);
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 7/9] x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain
  2020-10-13  8:11                                       ` [PATCH 0/9] Remove irq_remapping_get_irq_domain() David Woodhouse
                                                           ` (5 preceding siblings ...)
  2020-10-13  8:11                                         ` [PATCH 6/9] x86/hpet: Use irq_find_matching_fwspec() to find " David Woodhouse
@ 2020-10-13  8:11                                         ` David Woodhouse
  2020-10-13  8:11                                         ` [PATCH 8/9] x86: Kill all traces of irq_remapping_get_irq_domain() David Woodhouse
  2020-10-13  8:11                                         ` [PATCH 9/9] iommu/vt-d: Simplify intel_irq_remapping_select() David Woodhouse
  8 siblings, 0 replies; 58+ messages in thread
From: David Woodhouse @ 2020-10-13  8:11 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kernel/apic/io_apic.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index ca2da19d5c55..73cacc92c3bb 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2305,36 +2305,36 @@ static inline void __init check_timer(void)
 
 static int mp_irqdomain_create(int ioapic)
 {
-	struct irq_alloc_info info;
 	struct irq_domain *parent;
 	int hwirqs = mp_ioapic_pin_count(ioapic);
 	struct ioapic *ip = &ioapics[ioapic];
 	struct ioapic_domain_cfg *cfg = &ip->irqdomain_cfg;
 	struct mp_ioapic_gsi *gsi_cfg = mp_ioapic_gsi_routing(ioapic);
 	struct fwnode_handle *fn;
-	char *name = "IO-APIC";
+	struct irq_fwspec fwspec;
 
 	if (cfg->type == IOAPIC_DOMAIN_INVALID)
 		return 0;
 
-	init_irq_alloc_info(&info, NULL);
-	info.type = X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT;
-	info.devid = mpc_ioapic_id(ioapic);
-	parent = irq_remapping_get_irq_domain(&info);
-	if (!parent)
-		parent = x86_vector_domain;
-	else
-		name = "IO-APIC-IR";
-
 	/* Handle device tree enumerated APICs proper */
 	if (cfg->dev) {
 		fn = of_node_to_fwnode(cfg->dev);
 	} else {
-		fn = irq_domain_alloc_named_id_fwnode(name, ioapic);
+		fn = irq_domain_alloc_named_id_fwnode("IO-APIC", ioapic);
 		if (!fn)
 			return -ENOMEM;
 	}
 
+	fwspec.fwnode = fn;
+	fwspec.param_count = 1;
+	fwspec.param[0] = ioapic;
+	parent = irq_find_matching_fwspec(&fwspec, DOMAIN_BUS_ANY);
+	if (!parent) {
+		if (!cfg->dev)
+			irq_domain_free_fwnode(fn);
+		return -ENODEV;
+	}
+
 	ip->irqdomain = irq_domain_create_linear(fn, hwirqs, cfg->ops,
 						 (void *)(long)ioapic);
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 8/9] x86: Kill all traces of irq_remapping_get_irq_domain()
  2020-10-13  8:11                                       ` [PATCH 0/9] Remove irq_remapping_get_irq_domain() David Woodhouse
                                                           ` (6 preceding siblings ...)
  2020-10-13  8:11                                         ` [PATCH 7/9] x86/ioapic: " David Woodhouse
@ 2020-10-13  8:11                                         ` David Woodhouse
  2020-10-13  8:11                                         ` [PATCH 9/9] iommu/vt-d: Simplify intel_irq_remapping_select() David Woodhouse
  8 siblings, 0 replies; 58+ messages in thread
From: David Woodhouse @ 2020-10-13  8:11 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/hw_irq.h        |  2 --
 arch/x86/include/asm/irq_remapping.h |  9 --------
 drivers/iommu/amd/iommu.c            | 34 ----------------------------
 drivers/iommu/hyperv-iommu.c         |  9 --------
 drivers/iommu/intel/irq_remapping.c  | 17 --------------
 drivers/iommu/irq_remapping.c        | 14 ------------
 drivers/iommu/irq_remapping.h        |  3 ---
 7 files changed, 88 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index aabd8f1b6bb0..aef795f17478 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -40,8 +40,6 @@ enum irq_alloc_type {
 	X86_IRQ_ALLOC_TYPE_PCI_MSIX,
 	X86_IRQ_ALLOC_TYPE_DMAR,
 	X86_IRQ_ALLOC_TYPE_UV,
-	X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT,
-	X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT,
 };
 
 struct ioapic_alloc_info {
diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index af4a151d70b3..7cc49432187f 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -44,9 +44,6 @@ extern int irq_remapping_reenable(int);
 extern int irq_remap_enable_fault_handling(void);
 extern void panic_if_irq_remap(const char *msg);
 
-extern struct irq_domain *
-irq_remapping_get_irq_domain(struct irq_alloc_info *info);
-
 /* Create PCI MSI/MSIx irqdomain, use @parent as the parent irqdomain. */
 extern struct irq_domain *
 arch_create_remap_msi_irq_domain(struct irq_domain *par, const char *n, int id);
@@ -71,11 +68,5 @@ static inline void panic_if_irq_remap(const char *msg)
 {
 }
 
-static inline struct irq_domain *
-irq_remapping_get_irq_domain(struct irq_alloc_info *info)
-{
-	return NULL;
-}
-
 #endif /* CONFIG_IRQ_REMAP */
 #endif /* __X86_IRQ_REMAPPING_H */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 7ecebc5d255f..16adbf9d8fbb 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3536,10 +3536,8 @@ static int get_devid(struct irq_alloc_info *info)
 {
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC:
-	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
 		return get_ioapic_devid(info->devid);
 	case X86_IRQ_ALLOC_TYPE_HPET:
-	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
 		return get_hpet_devid(info->devid);
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
@@ -3550,44 +3548,12 @@ static int get_devid(struct irq_alloc_info *info)
 	}
 }
 
-static struct irq_domain *get_irq_domain_for_devid(struct irq_alloc_info *info,
-						   int devid)
-{
-	struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
-
-	if (!iommu)
-		return NULL;
-
-	switch (info->type) {
-	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
-	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
-		return iommu->ir_domain;
-	default:
-		WARN_ON_ONCE(1);
-		return NULL;
-	}
-}
-
-static struct irq_domain *get_irq_domain(struct irq_alloc_info *info)
-{
-	int devid;
-
-	if (!info)
-		return NULL;
-
-	devid = get_devid(info);
-	if (devid < 0)
-		return NULL;
-	return get_irq_domain_for_devid(info, devid);
-}
-
 struct irq_remap_ops amd_iommu_irq_ops = {
 	.prepare		= amd_iommu_prepare,
 	.enable			= amd_iommu_enable,
 	.disable		= amd_iommu_disable,
 	.reenable		= amd_iommu_reenable,
 	.enable_faulting	= amd_iommu_enable_faulting,
-	.get_irq_domain		= get_irq_domain,
 };
 
 static int irq_remapping_select(struct irq_domain *d, struct irq_fwspec *fwspec,
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 6a8966fbc3bd..e7ed2bb83358 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -160,18 +160,9 @@ static int __init hyperv_enable_irq_remapping(void)
 	return IRQ_REMAP_X2APIC_MODE;
 }
 
-static struct irq_domain *hyperv_get_irq_domain(struct irq_alloc_info *info)
-{
-	if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT)
-		return ioapic_ir_domain;
-	else
-		return NULL;
-}
-
 struct irq_remap_ops hyperv_irq_remap_ops = {
 	.prepare		= hyperv_prepare_irq_remapping,
 	.enable			= hyperv_enable_irq_remapping,
-	.get_irq_domain		= hyperv_get_irq_domain,
 };
 
 #endif
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 40c2fec122b8..ccf61cd18f69 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1128,29 +1128,12 @@ static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
 	irte->redir_hint = 1;
 }
 
-static struct irq_domain *intel_get_irq_domain(struct irq_alloc_info *info)
-{
-	if (!info)
-		return NULL;
-
-	switch (info->type) {
-	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
-		return map_ioapic_to_ir(info->devid);
-	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
-		return map_hpet_to_ir(info->devid);
-	default:
-		WARN_ON_ONCE(1);
-		return NULL;
-	}
-}
-
 struct irq_remap_ops intel_irq_remap_ops = {
 	.prepare		= intel_prepare_irq_remapping,
 	.enable			= intel_enable_irq_remapping,
 	.disable		= disable_irq_remapping,
 	.reenable		= reenable_irq_remapping,
 	.enable_faulting	= enable_drhd_fault_handling,
-	.get_irq_domain		= intel_get_irq_domain,
 };
 
 static void intel_ir_reconfigure_irte(struct irq_data *irqd, bool force)
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index 2d84b1ed205e..83314b9d8f38 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -158,17 +158,3 @@ void panic_if_irq_remap(const char *msg)
 	if (irq_remapping_enabled)
 		panic(msg);
 }
-
-/**
- * irq_remapping_get_irq_domain - Get the irqdomain serving the request @info
- * @info: interrupt allocation information, used to identify the IOMMU device
- *
- * Returns pointer to IRQ domain, or NULL on failure.
- */
-struct irq_domain *irq_remapping_get_irq_domain(struct irq_alloc_info *info)
-{
-	if (!remap_ops || !remap_ops->get_irq_domain)
-		return NULL;
-
-	return remap_ops->get_irq_domain(info);
-}
diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
index 1661b3d75920..8c89cb947cdb 100644
--- a/drivers/iommu/irq_remapping.h
+++ b/drivers/iommu/irq_remapping.h
@@ -42,9 +42,6 @@ struct irq_remap_ops {
 
 	/* Enable fault handling */
 	int  (*enable_faulting)(void);
-
-	/* Get the irqdomain associated to IOMMU device */
-	struct irq_domain *(*get_irq_domain)(struct irq_alloc_info *);
 };
 
 extern struct irq_remap_ops intel_irq_remap_ops;
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 9/9] iommu/vt-d: Simplify intel_irq_remapping_select()
  2020-10-13  8:11                                       ` [PATCH 0/9] Remove irq_remapping_get_irq_domain() David Woodhouse
                                                           ` (7 preceding siblings ...)
  2020-10-13  8:11                                         ` [PATCH 8/9] x86: Kill all traces of irq_remapping_get_irq_domain() David Woodhouse
@ 2020-10-13  8:11                                         ` David Woodhouse
  8 siblings, 0 replies; 58+ messages in thread
From: David Woodhouse @ 2020-10-13  8:11 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz

From: David Woodhouse <dwmw@amazon.co.uk>

Now that the old get_irq_domain() method has gone, we can consolidate on
just the map_XXX_to_iommu() functions.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 drivers/iommu/intel/irq_remapping.c | 19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index ccf61cd18f69..8a41bcb10e26 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -204,13 +204,13 @@ static int modify_irte(struct irq_2_iommu *irq_iommu,
 	return rc;
 }
 
-static struct irq_domain *map_hpet_to_ir(u8 hpet_id)
+static struct intel_iommu *map_hpet_to_iommu(u8 hpet_id)
 {
 	int i;
 
 	for (i = 0; i < MAX_HPET_TBS; i++) {
 		if (ir_hpet[i].id == hpet_id && ir_hpet[i].iommu)
-			return ir_hpet[i].iommu->ir_domain;
+			return ir_hpet[i].iommu;
 	}
 	return NULL;
 }
@@ -226,13 +226,6 @@ static struct intel_iommu *map_ioapic_to_iommu(int apic)
 	return NULL;
 }
 
-static struct irq_domain *map_ioapic_to_ir(int apic)
-{
-	struct intel_iommu *iommu = map_ioapic_to_iommu(apic);
-
-	return iommu ? iommu->ir_domain : NULL;
-}
-
 static struct irq_domain *map_dev_to_ir(struct pci_dev *dev)
 {
 	struct dmar_drhd_unit *drhd = dmar_find_matched_drhd_unit(dev);
@@ -1422,12 +1415,14 @@ static int intel_irq_remapping_select(struct irq_domain *d,
 				      struct irq_fwspec *fwspec,
 				      enum irq_domain_bus_token bus_token)
 {
+	struct intel_iommu *iommu = NULL;
+
 	if (x86_fwspec_is_ioapic(fwspec))
-		return d == map_ioapic_to_ir(fwspec->param[0]);
+		iommu = map_ioapic_to_iommu(fwspec->param[0]);
 	else if (x86_fwspec_is_hpet(fwspec))
-		return d == map_hpet_to_ir(fwspec->param[0]);
+		iommu = map_hpet_to_iommu(fwspec->param[0]);
 
-	return 0;
+	return iommu && d == iommu->ir_domain;
 }
 
 static const struct irq_domain_ops intel_ir_domain_ops = {
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-13  7:52                                     ` David Woodhouse
  2020-10-13  8:11                                       ` [PATCH 0/9] Remove irq_remapping_get_irq_domain() David Woodhouse
@ 2020-10-13  9:28                                       ` Thomas Gleixner
  2020-10-13 10:15                                         ` David Woodhouse
  2020-10-13 10:46                                         ` Thomas Gleixner
  1 sibling, 2 replies; 58+ messages in thread
From: Thomas Gleixner @ 2020-10-13  9:28 UTC (permalink / raw)
  To: David Woodhouse, x86, Marc Zyngier; +Cc: kvm, Paolo Bonzini, linux-kernel

On Tue, Oct 13 2020 at 08:52, David Woodhouse wrote:
> On Tue, 2020-10-13 at 00:13 +0200, Thomas Gleixner wrote:
> +       dom = irq_find_matching_fwspec(fwspec, DOMAIN_BUS_IR);
> +       if (dom)
> +               return IS_ERR(dom) ? NULL : dom;
> +
> +       return x86_vector_domain;
> +}
>
> Ick. There's no need for that.
>
> Eliminating that awful "if not found then slip the x86_vector_domain in
> as a special case" was the whole *point* of using
> irq_find_matching_fwspec() in the first place.

The point was to get rid of irq_remapping_get_irq_domain().

And TBH,

        if (apicid_valid(32768))

is just another way to slip the vector domain in. It's just differently
awful.

Having an explicit answer from the search for IR:

    - Here is the domain
    - Your device is not registered properly
    - IR not enabled or not supported

is way more obvious than the above disguised is_remapping_enabled()
check.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-13  9:28                                       ` [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID Thomas Gleixner
@ 2020-10-13 10:15                                         ` David Woodhouse
  2020-10-13 10:46                                         ` Thomas Gleixner
  1 sibling, 0 replies; 58+ messages in thread
From: David Woodhouse @ 2020-10-13 10:15 UTC (permalink / raw)
  To: Thomas Gleixner, x86, Marc Zyngier; +Cc: kvm, Paolo Bonzini, linux-kernel


[-- Attachment #1: Type: text/plain, Size: 2575 bytes --]

On Tue, 2020-10-13 at 11:28 +0200, Thomas Gleixner wrote:
> On Tue, Oct 13 2020 at 08:52, David Woodhouse wrote:
> > On Tue, 2020-10-13 at 00:13 +0200, Thomas Gleixner wrote:
> > +       dom = irq_find_matching_fwspec(fwspec, DOMAIN_BUS_IR);
> > +       if (dom)
> > +               return IS_ERR(dom) ? NULL : dom;
> > +
> > +       return x86_vector_domain;
> > +}
> > 
> > Ick. There's no need for that.
> > 
> > Eliminating that awful "if not found then slip the x86_vector_domain in
> > as a special case" was the whole *point* of using
> > irq_find_matching_fwspec() in the first place.
> 
> The point was to get rid of irq_remapping_get_irq_domain().

My reason for doing it was to get rid of irq_remapping_get_irq_domain()
*because* I hated the special-casing and magical slipping in of
x86_vector_domain when it returned NULL.

> And TBH,
> 
>         if (apicid_valid(32768))
> 
> is just another way to slip the vector domain in. It's just differently
> awful.

For me, that's very much not just "slipping the vector domain in".
That's the vector domain returning true in its *own* ->select()
function, in the circumstances where it wants to be used.

The key difference is that nobody needs an external magic pointer to
the x86_vector_domain. In a true irqdomain hierarchy system, shouldn't
we be trying to eliminate *all* those magic pointers to specific
domains, if we can?

And sure, the apicid_valid(32768) as a proxy for irq_remapping_enabled
is a bit of an ugly trick. I suppose we can explicitly expose
irq_remapping_enabled from drivers/iommu if we wanted to.

> Having an explicit answer from the search for IR:
> 
>     - Here is the domain
>     - Your device is not registered properly
>     - IR not enabled or not supported
> 
> is way more obvious than the above disguised is_remapping_enabled()
> check.

I just don't even like thinking of it as a 'search for IR'.

HPET shouldn't be caring about IR any more than PCI devices do. It just
wants its parent irqdomain, that's all.

For I/OAPIC there's the slight complexity that it does actually ack
level-triggered interrupts differently when it's behind IR. But I don't
think we need a whole separate irq_chip for that; surely it could be
handled internally in ioapic_ack_level() ? 

Either way, even with that slight hack it's nicer to think of
mp_irqdomain_create() just wanting to find its parent domain, without
any special knowledge of IR and falling back to x86_parent_domain. The
hack for IR level-ack is then self-contained.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-13  9:28                                       ` [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID Thomas Gleixner
  2020-10-13 10:15                                         ` David Woodhouse
@ 2020-10-13 10:46                                         ` Thomas Gleixner
  2020-10-13 10:53                                           ` David Woodhouse
  1 sibling, 1 reply; 58+ messages in thread
From: Thomas Gleixner @ 2020-10-13 10:46 UTC (permalink / raw)
  To: David Woodhouse, x86, Marc Zyngier; +Cc: kvm, Paolo Bonzini, linux-kernel

On Tue, Oct 13 2020 at 11:28, Thomas Gleixner wrote:
> On Tue, Oct 13 2020 at 08:52, David Woodhouse wrote:
>> On Tue, 2020-10-13 at 00:13 +0200, Thomas Gleixner wrote:
>> +       dom = irq_find_matching_fwspec(fwspec, DOMAIN_BUS_IR);
>> +       if (dom)
>> +               return IS_ERR(dom) ? NULL : dom;
>> +
>> +       return x86_vector_domain;
>> +}
>>
>> Ick. There's no need for that.
>>
>> Eliminating that awful "if not found then slip the x86_vector_domain in
>> as a special case" was the whole *point* of using
>> irq_find_matching_fwspec() in the first place.
>
> The point was to get rid of irq_remapping_get_irq_domain().
>
> And TBH,
>
>         if (apicid_valid(32768))
>
> is just another way to slip the vector domain in. It's just differently
> awful.
>
> Having an explicit answer from the search for IR:
>
>     - Here is the domain
>     - Your device is not registered properly
>     - IR not enabled or not supported
>
> is way more obvious than the above disguised is_remapping_enabled()
> check.

And after becoming more awake, that wont work anyway because there is
more than one IR domain, so there is no way to return an error "You
forgot to register" obviously.

But the APIC id (32768) valid check is also broken because IR can be
enabled even without X2APIC.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-13 10:46                                         ` Thomas Gleixner
@ 2020-10-13 10:53                                           ` David Woodhouse
  2020-10-13 11:51                                             ` David Woodhouse
  0 siblings, 1 reply; 58+ messages in thread
From: David Woodhouse @ 2020-10-13 10:53 UTC (permalink / raw)
  To: Thomas Gleixner, x86, Marc Zyngier; +Cc: kvm, Paolo Bonzini, linux-kernel


[-- Attachment #1: Type: text/plain, Size: 773 bytes --]

On Tue, 2020-10-13 at 12:46 +0200, Thomas Gleixner wrote:
> And after becoming more awake, that wont work anyway because there is
> more than one IR domain, so there is no way to return an error "You
> forgot to register" obviously.
> 
> But the APIC id (32768) valid check is also broken because IR can be
> enabled even without X2APIC.

Nope, it's perfectly OK to allow HPET and I/OAPIC to be parented in the
x86_vector_domain in that case, regardless of IR.

The *actual* criterion for x86_vector_select() returning zero to say 
"don't use me" is "could there be CPUs in the system which can't be
reached through x86_vector_msi_compose_msg()?". It's not really about
IR at all.

The apic_id_valid(32768) check is checking precisely the right thing.



[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-13 10:53                                           ` David Woodhouse
@ 2020-10-13 11:51                                             ` David Woodhouse
  2020-10-13 12:40                                               ` Thomas Gleixner
  0 siblings, 1 reply; 58+ messages in thread
From: David Woodhouse @ 2020-10-13 11:51 UTC (permalink / raw)
  To: Thomas Gleixner, x86, Marc Zyngier; +Cc: kvm, Paolo Bonzini, linux-kernel


[-- Attachment #1: Type: text/plain, Size: 2458 bytes --]

On Tue, 2020-10-13 at 11:53 +0100, David Woodhouse wrote:
> On Tue, 2020-10-13 at 12:46 +0200, Thomas Gleixner wrote:
> > And after becoming more awake, that wont work anyway because there is
> > more than one IR domain, so there is no way to return an error "You
> > forgot to register" obviously.
> > 
> > But the APIC id (32768) valid check is also broken because IR can be
> > enabled even without X2APIC.
> 
> Nope, it's perfectly OK to allow HPET and I/OAPIC to be parented in the
> x86_vector_domain in that case, regardless of IR.
> 
> The *actual* criterion for x86_vector_select() returning zero to say 
> "don't use me" is "could there be CPUs in the system which can't be
> reached through x86_vector_msi_compose_msg()?". It's not really about
> IR at all.
> 
> The apic_id_valid(32768) check is checking precisely the right thing.

With that realisation, I've fixed the comment in my ext_dest_id branch
to remove all mention of IRQ remapping. It now looks like this:

static int x86_vector_select(struct irq_domain *d, struct irq_fwspec *fwspec,
			     enum irq_domain_bus_token bus_token)
{
	/*
	 * HPET and I/OAPIC drivers use irq_find_matching_irqdomain()
	 * to find their parent irqdomain. For x86_vector_domain to be
	 * suitable, all CPUs in the system must be reachable by its
	 * x86_vector_msi_compose_msg() function. Which is only true
	 * in !x2apic mode, or in x2apic physical mode if APIC IDs were
	 * restricted to 8 or 15 bits at boot time. In those cases,
	 * 1<<15 will *not* be a valid APIC ID.
	 */
	if (apic->apic_id_valid(1<<15))
		return 0;

	return x86_fwspec_is_ioapic(fwspec) || x86_fwspec_is_hpet(fwspec);
}

That makes it clearer that this isn't just some incestuous interaction
with IRQ remapping — that APIC ID limit really is the basis on which
this irqdomain, all by itself, makes the decision about whether it's
capable of being the parent irqdomain to the requesting device.

It just so happens that *if* you allow higher APIC IDs in the system
and *if* no other irqdomains exist which take ownership of a given HPET
or I/OAPIC, then that child device will fail to find a suitable parent
irqdomain and will fail to initialise. And that's just how we want it.

Sure, we have that special case in x2apic startup where we ensure that
if we don't have IR then we limit the APIC IDs. But that's there, and
*this* code is perfectly self-contained and isolated.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-13 11:51                                             ` David Woodhouse
@ 2020-10-13 12:40                                               ` Thomas Gleixner
  0 siblings, 0 replies; 58+ messages in thread
From: Thomas Gleixner @ 2020-10-13 12:40 UTC (permalink / raw)
  To: David Woodhouse, x86, Marc Zyngier; +Cc: kvm, Paolo Bonzini, linux-kernel

On Tue, Oct 13 2020 at 12:51, David Woodhouse wrote:
> With that realisation, I've fixed the comment in my ext_dest_id branch
> to remove all mention of IRQ remapping. It now looks like this:
>
> static int x86_vector_select(struct irq_domain *d, struct irq_fwspec *fwspec,
> 			     enum irq_domain_bus_token bus_token)
> {
> 	/*
> 	 * HPET and I/OAPIC drivers use irq_find_matching_irqdomain()
> 	 * to find their parent irqdomain. For x86_vector_domain to be
> 	 * suitable, all CPUs in the system must be reachable by its
> 	 * x86_vector_msi_compose_msg() function. Which is only true
> 	 * in !x2apic mode, or in x2apic physical mode if APIC IDs were
> 	 * restricted to 8 or 15 bits at boot time. In those cases,
> 	 * 1<<15 will *not* be a valid APIC ID.
> 	 */
> 	if (apic->apic_id_valid(1<<15))
> 		return 0;
>
> 	return x86_fwspec_is_ioapic(fwspec) || x86_fwspec_is_hpet(fwspec);
> }
>
> That makes it clearer that this isn't just some incestuous interaction
> with IRQ remapping — that APIC ID limit really is the basis on which
> this irqdomain, all by itself, makes the decision about whether it's
> capable of being the parent irqdomain to the requesting device.

Yes, that makes sense now.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 8/8] x86/ioapic: Generate RTE directly from parent irqchip's MSI message
  2020-10-09 10:46   ` [PATCH v2 8/8] x86/ioapic: Generate RTE directly from parent irqchip's MSI message David Woodhouse
@ 2020-10-22 21:43     ` Thomas Gleixner
  2020-10-22 22:10       ` Thomas Gleixner
  0 siblings, 1 reply; 58+ messages in thread
From: Thomas Gleixner @ 2020-10-22 21:43 UTC (permalink / raw)
  To: David Woodhouse, x86; +Cc: kvm, Paolo Bonzini, linux-kernel, linux-hyperv

On Fri, Oct 09 2020 at 11:46, David Woodhouse wrote:

@@ -45,12 +45,11 @@ enum irq_alloc_type {
 };

> +static void mp_swizzle_msi_dest_bits(struct irq_data *irq_data, void *_entry)
> +{
> +	struct msi_msg msg;
> +	u32 *entry = _entry;

Why is this a void * argument and then converting it to a u32 *? Just to
make that function completely unreadable?

> +
> +	irq_chip_compose_msi_msg(irq_data, &msg);

Lacks a comment. Also mp_swizzle... is a misnomer as this invokes the
msi compose function which is not what the function name suggests.

> +	/*
> +	 * They're in a bit of a random order for historical reasons, but
> +	 * the IO/APIC is just a device for turning interrupt lines into
> +	 * MSIs, and various bits of the MSI addr/data are just swizzled
> +	 * into/from the bits of Redirection Table Entry.
> +	 */
> +	entry[0] &= 0xfffff000;
> +	entry[0] |= (msg.data & (MSI_DATA_DELIVERY_MODE_MASK |
> +				 MSI_DATA_VECTOR_MASK));
> +	entry[0] |= (msg.address_lo & MSI_ADDR_DEST_MODE_MASK) << 9;
> +
> +	entry[1] &= 0xffff;
> +	entry[1] |= (msg.address_lo & MSI_ADDR_DEST_ID_MASK) << 12;

Sorry, but this is unreviewable gunk. The whole msi_msg setup sucks with
this unholy macro maze. I have a half finished series which allows
architectures to provide shadow members for data, address_* so this can
be done proper with bitfields.

Aside of that it works magically because polarity,trigger and mask bit
have been set up before. But of course a comment about this is
completely overrated.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 8/8] x86/ioapic: Generate RTE directly from parent irqchip's MSI message
  2020-10-22 21:43     ` Thomas Gleixner
@ 2020-10-22 22:10       ` Thomas Gleixner
  0 siblings, 0 replies; 58+ messages in thread
From: Thomas Gleixner @ 2020-10-22 22:10 UTC (permalink / raw)
  To: David Woodhouse, x86; +Cc: kvm, Paolo Bonzini, linux-kernel, linux-hyperv

On Thu, Oct 22 2020 at 23:43, Thomas Gleixner wrote:
> On Fri, Oct 09 2020 at 11:46, David Woodhouse wrote:
> Aside of that it works magically because polarity,trigger and mask bit
> have been set up before. But of course a comment about this is
> completely overrated.

Also this part:

> -static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
> -			   struct IO_APIC_route_entry *entry)
> +static void mp_setup_entry(struct irq_data *irq_data, struct mp_chip_data *data)
>  {
> +	struct IO_APIC_route_entry *entry = &data->entry;
> +
>  	memset(entry, 0, sizeof(*entry));
> -	entry->delivery_mode = apic->irq_delivery_mode;
> -	entry->dest_mode     = apic->irq_dest_mode;
> -	entry->dest	     = cfg->dest_apicid & 0xff;
> -	entry->virt_ext_dest = cfg->dest_apicid >> 8;
> -	entry->vector	     = cfg->vector;
> +
> +	mp_swizzle_msi_dest_bits(irq_data, entry);
> +
>  	entry->trigger	     = data->trigger;
>  	entry->polarity	     = data->polarity;
>  	/*

does not make sense. It did not make sense before either, but now it
does even make less sense.

During allocation this only needs to setup the I/O-APIC specific bits
(trigger, polarity, mask). The rest is filled in when the actual
activation happens. Nothing writes that entry _before_ activation.

/me goes to mop up more


^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, back to index

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-07 12:20 [PATCH 0/5] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
2020-10-07 12:20 ` [PATCH 1/5] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
2020-10-07 12:20   ` [PATCH 2/5] x86/msi: Only use high bits of MSI address for DMAR unit David Woodhouse
2020-10-07 12:20   ` [PATCH 3/5] x86/ioapic: Handle Extended Destination ID field in RTE David Woodhouse
2020-10-08  9:12     ` Peter Zijlstra
2020-10-08 17:05       ` David Woodhouse
2020-10-08 11:41     ` Thomas Gleixner
2020-10-07 12:20   ` [PATCH 4/5] x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available David Woodhouse
2020-10-08 11:54     ` Thomas Gleixner
2020-10-08 12:02       ` Thomas Gleixner
2020-10-08 13:00       ` David Woodhouse
2020-10-07 12:20   ` [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID David Woodhouse
2020-10-08 12:05     ` Thomas Gleixner
2020-10-08 12:55       ` David Woodhouse
2020-10-08 16:08         ` David Woodhouse
2020-10-08 21:14           ` Thomas Gleixner
2020-10-08 21:39             ` David Woodhouse
2020-10-08 23:27               ` Thomas Gleixner
2020-10-09  6:07                 ` David Woodhouse
2020-10-10 10:06                 ` David Woodhouse
2020-10-10 11:44                   ` Thomas Gleixner
2020-10-10 11:58                     ` David Woodhouse
2020-10-11 17:12                       ` Thomas Gleixner
2020-10-11 21:15                         ` David Woodhouse
2020-10-12  9:33                           ` Thomas Gleixner
2020-10-12 16:06                             ` David Woodhouse
2020-10-12 18:38                               ` Thomas Gleixner
2020-10-12 20:20                                 ` David Woodhouse
2020-10-12 22:13                                   ` Thomas Gleixner
2020-10-13  7:52                                     ` David Woodhouse
2020-10-13  8:11                                       ` [PATCH 0/9] Remove irq_remapping_get_irq_domain() David Woodhouse
2020-10-13  8:11                                         ` [PATCH 1/9] genirq/irqdomain: Implement get_name() method on irqchip fwnodes David Woodhouse
2020-10-13  8:11                                         ` [PATCH 2/9] x86/apic: Add select() method on vector irqdomain David Woodhouse
2020-10-13  8:11                                         ` [PATCH 3/9] iommu/amd: Implement select() method on remapping irqdomain David Woodhouse
2020-10-13  8:11                                         ` [PATCH 4/9] iommu/vt-d: " David Woodhouse
2020-10-13  8:11                                         ` [PATCH 5/9] iommu/hyper-v: " David Woodhouse
2020-10-13  8:11                                         ` [PATCH 6/9] x86/hpet: Use irq_find_matching_fwspec() to find " David Woodhouse
2020-10-13  8:11                                         ` [PATCH 7/9] x86/ioapic: " David Woodhouse
2020-10-13  8:11                                         ` [PATCH 8/9] x86: Kill all traces of irq_remapping_get_irq_domain() David Woodhouse
2020-10-13  8:11                                         ` [PATCH 9/9] iommu/vt-d: Simplify intel_irq_remapping_select() David Woodhouse
2020-10-13  9:28                                       ` [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID Thomas Gleixner
2020-10-13 10:15                                         ` David Woodhouse
2020-10-13 10:46                                         ` Thomas Gleixner
2020-10-13 10:53                                           ` David Woodhouse
2020-10-13 11:51                                             ` David Woodhouse
2020-10-13 12:40                                               ` Thomas Gleixner
2020-10-08 11:46   ` [PATCH 1/5] x86/apic: Fix x2apic enablement without interrupt remapping Thomas Gleixner
2020-10-09 10:46 ` [PATCH v2 0/8] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
2020-10-09 10:46   ` [PATCH v2 1/8] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
2020-10-09 10:46   ` [PATCH v2 2/8] x86/msi: Only use high bits of MSI address for DMAR unit David Woodhouse
2020-10-09 10:46   ` [PATCH v2 3/8] x86/apic: Always provide irq_compose_msi_msg() method for vector domain David Woodhouse
2020-10-09 10:46   ` [PATCH v2 4/8] x86/ioapic: Handle Extended Destination ID field in RTE David Woodhouse
2020-10-09 10:46   ` [PATCH v2 5/8] x86/apic: Support 15 bits of APIC ID in MSI where available David Woodhouse
2020-10-09 10:46   ` [PATCH v2 6/8] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID David Woodhouse
2020-10-09 10:46   ` [PATCH v2 7/8] x86/hpet: Move MSI support into hpet.c David Woodhouse
2020-10-09 10:46   ` [PATCH v2 8/8] x86/ioapic: Generate RTE directly from parent irqchip's MSI message David Woodhouse
2020-10-22 21:43     ` Thomas Gleixner
2020-10-22 22:10       ` Thomas Gleixner

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git