linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported
@ 2020-10-07 12:20 David Woodhouse
  2020-10-07 12:20 ` [PATCH 1/5] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
  2020-10-09 10:46 ` [PATCH v2 0/8] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
  0 siblings, 2 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-07 12:20 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1668 bytes --]

Splitting out the simpler parts of my previous patch set. The full
support for per-irqdomain affinity limits will take a bit more work but
this part is quite simple.

Since we don't yet have per-irqdomain affinity, we currently attempt to
avoid bringing CPUs online at all if they can't be targeted by external
interrupts. Except we still let them get hotplugged later... which is
moderately suboptimal.

Fix that, and support the hypervisor enlightenment which at least
extends the range of targetable APIC IDs to 15 bits, as seen in the
patch at https://patchwork.kernel.org/patch/11816693/ for qemu.


David Woodhouse (5):
      x86/apic: Fix x2apic enablement without interrupt remapping
      x86/msi: Only use high bits of MSI address for DMAR unit
      x86/ioapic: Handle Extended Destination ID field in RTE
      x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available
      x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID

 Documentation/virt/kvm/cpuid.rst     |  4 ++++
 arch/x86/include/asm/apic.h          |  1 +
 arch/x86/include/asm/io_apic.h       |  3 ++-
 arch/x86/include/asm/mpspec.h        |  1 +
 arch/x86/include/asm/x86_init.h      |  2 ++
 arch/x86/include/uapi/asm/kvm_para.h |  1 +
 arch/x86/kernel/apic/apic.c          | 27 +++++++++++++++++++++------
 arch/x86/kernel/apic/io_apic.c       | 19 +++++++++++++------
 arch/x86/kernel/apic/msi.c           | 41 +++++++++++++++++++++++++++++++++++------
 arch/x86/kernel/apic/x2apic_phys.c   |  9 +++++++++
 arch/x86/kernel/kvm.c                |  6 ++++++
 arch/x86/kernel/x86_init.c           |  1 +
 12 files changed, 96 insertions(+), 19 deletions(-)

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH 1/5] x86/apic: Fix x2apic enablement without interrupt remapping
  2020-10-07 12:20 [PATCH 0/5] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
@ 2020-10-07 12:20 ` David Woodhouse
  2020-10-07 12:20   ` [PATCH 2/5] x86/msi: Only use high bits of MSI address for DMAR unit David Woodhouse
                     ` (4 more replies)
  2020-10-09 10:46 ` [PATCH v2 0/8] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
  1 sibling, 5 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-07 12:20 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel

From: David Woodhouse <dwmw@amazon.co.uk>

Currently, Linux as a hypervisor guest will enable x2apic only if there
are no CPUs present at boot time with an APIC ID above 255.

Hotplugging a CPU later with a higher APIC ID would result in a CPU
which cannot be targeted by external interrupts.

Add a filter in x2apic_apic_id_valid() which can be used to prevent
such CPUs from coming online, and allow x2apic to be enabled even if
they are present at boot time.

Fixes: ce69a784504 ("x86/apic: Enable x2APIC without interrupt remapping under KVM")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/apic.h        |  1 +
 arch/x86/kernel/apic/apic.c        | 14 ++++++++------
 arch/x86/kernel/apic/x2apic_phys.c |  9 +++++++++
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 1c129abb7f09..b0fd204e0023 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -259,6 +259,7 @@ static inline u64 native_x2apic_icr_read(void)
 
 extern int x2apic_mode;
 extern int x2apic_phys;
+extern void __init x2apic_set_max_apicid(u32 apicid);
 extern void __init check_x2apic(void);
 extern void x2apic_setup(void);
 static inline int x2apic_enabled(void)
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index b3eef1d5c903..113f6ca7b828 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1841,20 +1841,22 @@ static __init void try_to_enable_x2apic(int remap_mode)
 		return;
 
 	if (remap_mode != IRQ_REMAP_X2APIC_MODE) {
-		/* IR is required if there is APIC ID > 255 even when running
-		 * under KVM
+		/*
+		 * Using X2APIC without IR is not architecturally supported
+		 * on bare metal but may be supported in guests.
 		 */
-		if (max_physical_apicid > 255 ||
-		    !x86_init.hyper.x2apic_available()) {
+		if (!x86_init.hyper.x2apic_available()) {
 			pr_info("x2apic: IRQ remapping doesn't support X2APIC mode\n");
 			x2apic_disable();
 			return;
 		}
 
 		/*
-		 * without IR all CPUs can be addressed by IOAPIC/MSI
-		 * only in physical mode
+		 * Without IR, all CPUs can be addressed by IOAPIC/MSI only
+		 * in physical mode, and CPUs with an APIC ID that cannnot
+		 * be addressed must not be brought online.
 		 */
+		x2apic_set_max_apicid(255);
 		x2apic_phys = 1;
 	}
 	x2apic_enable();
diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c
index bc9693841353..b4b4e89c1118 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -8,6 +8,12 @@
 int x2apic_phys;
 
 static struct apic apic_x2apic_phys;
+static u32 x2apic_max_apicid;
+
+void __init x2apic_set_max_apicid(u32 apicid)
+{
+	x2apic_max_apicid = apicid;
+}
 
 static int __init set_x2apic_phys_mode(char *arg)
 {
@@ -98,6 +104,9 @@ static int x2apic_phys_probe(void)
 /* Common x2apic functions, also used by x2apic_cluster */
 int x2apic_apic_id_valid(u32 apicid)
 {
+	if (x2apic_max_apicid && apicid > x2apic_max_apicid)
+		return 0;
+
 	return 1;
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH 2/5] x86/msi: Only use high bits of MSI address for DMAR unit
  2020-10-07 12:20 ` [PATCH 1/5] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
@ 2020-10-07 12:20   ` David Woodhouse
  2020-10-07 12:20   ` [PATCH 3/5] x86/ioapic: Handle Extended Destination ID field in RTE David Woodhouse
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-07 12:20 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel

From: David Woodhouse <dwmw@amazon.co.uk>

The Intel IOMMU has an MSI-like configuration for its interrupt, but
it isn't really MSI. So it gets to abuse the high 32 bits of the address,
and puts the high 24 bits of the extended APIC ID there.

This isn't something that can be used in the general case for real MSIs,
since external devices using the high bits of the address would be
performing writes to actual memory space above 4GiB, not targeted at the
APIC.

Factor the hack out and allow it only to be used when appropriate,
adding a WARN_ON_ONCE() if other MSIs are targeted at an unreachable
APIC ID. In *theory* that should never happen since the compatibility
MSI messages are not supposed to be used with Interrupt Remapping
enabled. In practice, if IR is enabled but some devices aren't within
scope of any given remapping unit, it might happen. But that's a longer
story and this warning is the right thing to do in that case for the
short term.

The x2apic_enabled() check isn't needed because Linux won't bring up
CPUs with higher APIC IDs unless x2apic is enabled anyway.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kernel/apic/msi.c | 33 +++++++++++++++++++++++++++------
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 6313f0a05db7..2825e003259c 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -23,13 +23,11 @@
 
 struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
 
-static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg)
+static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
+				  bool dmar)
 {
 	msg->address_hi = MSI_ADDR_BASE_HI;
 
-	if (x2apic_enabled())
-		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
-
 	msg->address_lo =
 		MSI_ADDR_BASE_LO |
 		((apic->irq_dest_mode == 0) ?
@@ -43,18 +41,40 @@ static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg)
 		MSI_DATA_LEVEL_ASSERT |
 		MSI_DATA_DELIVERY_FIXED |
 		MSI_DATA_VECTOR(cfg->vector);
+
+	/*
+	 * Only the IOMMU itself can use the trick of putting destination
+	 * APIC ID into the high bits of the address. Anything else would
+	 * just be writing to memory if it tried that, and needs IR to
+	 * address higher APIC IDs.
+	 */
+	if (dmar)
+		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
+	else
+		WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid));
 }
 
 void x86_vector_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
 {
-	__irq_msi_compose_msg(irqd_cfg(data), msg);
+	__irq_msi_compose_msg(irqd_cfg(data), msg, false);
+}
+
+/*
+ * The Intel IOMMU (ab)uses the high bits of the MSI address to contain the
+ * high bits of the destination APIC ID. This can't be done in the general
+ * case for MSIs as it would be targeting real memory above 4GiB not the
+ * APIC.
+ */
+static void dmar_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
+{
+	__irq_msi_compose_msg(irqd_cfg(data), msg, true);
 }
 
 static void irq_msi_update_msg(struct irq_data *irqd, struct irq_cfg *cfg)
 {
 	struct msi_msg msg[2] = { [1] = { }, };
 
-	__irq_msi_compose_msg(cfg, msg);
+	__irq_msi_compose_msg(cfg, msg, 0);
 	irq_data_get_irq_chip(irqd)->irq_write_msi_msg(irqd, msg);
 }
 
@@ -288,6 +308,7 @@ static struct irq_chip dmar_msi_controller = {
 	.irq_ack		= irq_chip_ack_parent,
 	.irq_set_affinity	= msi_domain_set_affinity,
 	.irq_retrigger		= irq_chip_retrigger_hierarchy,
+	.irq_compose_msi_msg	= dmar_msi_compose_msg,
 	.irq_write_msi_msg	= dmar_msi_write_msg,
 	.flags			= IRQCHIP_SKIP_SET_WAKE,
 };
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH 3/5] x86/ioapic: Handle Extended Destination ID field in RTE
  2020-10-07 12:20 ` [PATCH 1/5] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
  2020-10-07 12:20   ` [PATCH 2/5] x86/msi: Only use high bits of MSI address for DMAR unit David Woodhouse
@ 2020-10-07 12:20   ` David Woodhouse
  2020-10-08  9:12     ` Peter Zijlstra
  2020-10-08 11:41     ` Thomas Gleixner
  2020-10-07 12:20   ` [PATCH 4/5] x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available David Woodhouse
                     ` (2 subsequent siblings)
  4 siblings, 2 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-07 12:20 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel

From: David Woodhouse <dwmw@amazon.co.uk>

The IOAPIC Redirection Table Entries contain an 8-bit Extended
Destination ID field which maps to bits 11-4 of the MSI address.

The lowest bit is used to indicate remappable format, when interrupt
remapping is in use. A hypervisor can use the other 7 bits to permit
guests to address up to 15 bits of APIC IDs, thus allowing 32768 vCPUs
before having to expose a vIOMMU and interrupt remapping to the guest.

No behavioural change in this patch, since nothing yet permits APIC IDs
above 255 to be used with the non-IR IOAPIC domain. Except for the case
where IR is enabled but there are IOAPICs which aren't in the scope of
any IOMMU, which is totally hosed anyway and needs fixing independently
of this change.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/io_apic.h |  3 ++-
 arch/x86/kernel/apic/io_apic.c | 19 +++++++++++++------
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index a1a26f6d3aa4..e65a0b7379d0 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -78,7 +78,8 @@ struct IO_APIC_route_entry {
 		mask		:  1,	/* 0: enabled, 1: disabled */
 		__reserved_2	: 15;
 
-	__u32	__reserved_3	: 24,
+	__u32	__reserved_3	: 17,
+		ext_dest	:  7,
 		dest		:  8;
 } __attribute__ ((packed));
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index a33380059db6..aa9a3b54a96c 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1239,10 +1239,10 @@ static void io_apic_print_entries(unsigned int apic, unsigned int nr_entries)
 			       buf, (ir_entry->index2 << 15) | ir_entry->index,
 			       ir_entry->zero);
 		else
-			printk(KERN_DEBUG "%s, %s, D(%02X), M(%1d)\n",
+			printk(KERN_DEBUG "%s, %s, D(%02X%02X), M(%1d)\n",
 			       buf,
 			       entry.dest_mode == IOAPIC_DEST_MODE_LOGICAL ?
-			       "logical " : "physical",
+			       "logical " : "physical", entry.ext_dest,
 			       entry.dest, entry.delivery_mode);
 	}
 }
@@ -1410,6 +1410,7 @@ void native_restore_boot_irq_mode(void)
 	 */
 	if (ioapic_i8259.pin != -1) {
 		struct IO_APIC_route_entry entry;
+		u32 apic_id = read_apic_id();
 
 		memset(&entry, 0, sizeof(entry));
 		entry.mask		= IOAPIC_UNMASKED;
@@ -1417,7 +1418,8 @@ void native_restore_boot_irq_mode(void)
 		entry.polarity		= IOAPIC_POL_HIGH;
 		entry.dest_mode		= IOAPIC_DEST_MODE_PHYSICAL;
 		entry.delivery_mode	= dest_ExtINT;
-		entry.dest		= read_apic_id();
+		entry.dest		= apic_id & 0xff;
+		entry.ext_dest		= apic_id >> 8;
 
 		/*
 		 * Add it to the IO-APIC irq-routing table:
@@ -1861,7 +1863,8 @@ static void ioapic_configure_entry(struct irq_data *irqd)
 	 * ioapic chip to verify that.
 	 */
 	if (irqd->chip == &ioapic_chip) {
-		mpd->entry.dest = cfg->dest_apicid;
+		mpd->entry.dest = cfg->dest_apicid & 0xff;
+		mpd->entry.ext_dest = cfg->dest_apicid >> 8;
 		mpd->entry.vector = cfg->vector;
 	}
 	for_each_irq_pin(entry, mpd->irq_2_pin)
@@ -2027,6 +2030,7 @@ static inline void __init unlock_ExtINT_logic(void)
 	int apic, pin, i;
 	struct IO_APIC_route_entry entry0, entry1;
 	unsigned char save_control, save_freq_select;
+	u32 apic_id;
 
 	pin  = find_isa_irq_pin(8, mp_INT);
 	if (pin == -1) {
@@ -2042,11 +2046,13 @@ static inline void __init unlock_ExtINT_logic(void)
 	entry0 = ioapic_read_entry(apic, pin);
 	clear_IO_APIC_pin(apic, pin);
 
+	apic_id = hard_smp_processor_id();
 	memset(&entry1, 0, sizeof(entry1));
 
 	entry1.dest_mode = IOAPIC_DEST_MODE_PHYSICAL;
 	entry1.mask = IOAPIC_UNMASKED;
-	entry1.dest = hard_smp_processor_id();
+	entry1.dest = apic_id & 0xff;
+	entry1.ext_dest = apic_id >> 8;
 	entry1.delivery_mode = dest_ExtINT;
 	entry1.polarity = entry0.polarity;
 	entry1.trigger = IOAPIC_EDGE;
@@ -2949,7 +2955,8 @@ static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
 	memset(entry, 0, sizeof(*entry));
 	entry->delivery_mode = apic->irq_delivery_mode;
 	entry->dest_mode     = apic->irq_dest_mode;
-	entry->dest	     = cfg->dest_apicid;
+	entry->dest	     = cfg->dest_apicid & 0xff;
+	entry->ext_dest	     = cfg->dest_apicid >> 8;
 	entry->vector	     = cfg->vector;
 	entry->trigger	     = data->trigger;
 	entry->polarity	     = data->polarity;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH 4/5] x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available
  2020-10-07 12:20 ` [PATCH 1/5] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
  2020-10-07 12:20   ` [PATCH 2/5] x86/msi: Only use high bits of MSI address for DMAR unit David Woodhouse
  2020-10-07 12:20   ` [PATCH 3/5] x86/ioapic: Handle Extended Destination ID field in RTE David Woodhouse
@ 2020-10-07 12:20   ` David Woodhouse
  2020-10-08 11:54     ` Thomas Gleixner
  2020-10-07 12:20   ` [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID David Woodhouse
  2020-10-08 11:46   ` [PATCH 1/5] x86/apic: Fix x2apic enablement without interrupt remapping Thomas Gleixner
  4 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-07 12:20 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel

From: David Woodhouse <dwmw@amazon.co.uk>

Some hypervisors can allow the guest to use the Extended Destination ID
field in the IOAPIC RTE and MSI address to address up to 32768 CPUs.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/mpspec.h   |  1 +
 arch/x86/include/asm/x86_init.h |  2 ++
 arch/x86/kernel/apic/apic.c     | 15 ++++++++++++++-
 arch/x86/kernel/apic/msi.c      | 10 +++++++++-
 arch/x86/kernel/x86_init.c      |  1 +
 5 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index e90ac7e9ae2c..25ee8ca0a1f2 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -42,6 +42,7 @@ extern DECLARE_BITMAP(mp_bus_not_pci, MAX_MP_BUSSES);
 extern unsigned int boot_cpu_physical_apicid;
 extern u8 boot_cpu_apic_version;
 extern unsigned long mp_lapic_addr;
+extern int msi_ext_dest_id;
 
 #ifdef CONFIG_X86_LOCAL_APIC
 extern int smp_found_config;
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 397196fae24d..5af3fe9e38f3 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -114,6 +114,7 @@ struct x86_init_pci {
  * @init_platform:		platform setup
  * @guest_late_init:		guest late init
  * @x2apic_available:		X2APIC detection
+ * @msi_ext_dest_id:		MSI and IOAPIC support 15-bit APIC IDs
  * @init_mem_mapping:		setup early mappings during init_mem_mapping()
  * @init_after_bootmem:		guest init after boot allocator is finished
  */
@@ -121,6 +122,7 @@ struct x86_hyper_init {
 	void (*init_platform)(void);
 	void (*guest_late_init)(void);
 	bool (*x2apic_available)(void);
+	bool (*msi_ext_dest_id)(void);
 	void (*init_mem_mapping)(void);
 	void (*init_after_bootmem)(void);
 };
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 113f6ca7b828..ba24a343c1f2 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1837,9 +1837,21 @@ static __init void x2apic_enable(void)
 
 static __init void try_to_enable_x2apic(int remap_mode)
 {
+	u32 apic_limit = 255;
+
 	if (x2apic_state == X2APIC_DISABLED)
 		return;
 
+	/*
+	 * If the hypervisor supports extended destination ID in IOAPIC
+	 * and MSI, that increases the maximum APIC ID that can be used
+	 * for non-remapped IRQ domains.
+	 */
+	if (x86_init.hyper.msi_ext_dest_id()) {
+		msi_ext_dest_id = 1;
+		apic_limit = 32767;
+	}
+
 	if (remap_mode != IRQ_REMAP_X2APIC_MODE) {
 		/*
 		 * Using X2APIC without IR is not architecturally supported
@@ -1856,9 +1868,10 @@ static __init void try_to_enable_x2apic(int remap_mode)
 		 * in physical mode, and CPUs with an APIC ID that cannnot
 		 * be addressed must not be brought online.
 		 */
-		x2apic_set_max_apicid(255);
+		x2apic_set_max_apicid(apic_limit);
 		x2apic_phys = 1;
 	}
+
 	x2apic_enable();
 }
 
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 2825e003259c..85206f971284 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -23,8 +23,11 @@
 
 struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
 
+int msi_ext_dest_id __ro_after_init;
+
 static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
 				  bool dmar)
+
 {
 	msg->address_hi = MSI_ADDR_BASE_HI;
 
@@ -46,10 +49,15 @@ static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
 	 * Only the IOMMU itself can use the trick of putting destination
 	 * APIC ID into the high bits of the address. Anything else would
 	 * just be writing to memory if it tried that, and needs IR to
-	 * address higher APIC IDs.
+	 * address APICs which can't be addressed in the normal 32-bit
+	 * address range at 0xFFExxxxx. That is typically just 8 bits, but
+	 * some hypervisors allow the extended destination ID field in bits
+	 * 11-5 to be used, giving support for 15 bits of APIC IDs in total.
 	 */
 	if (dmar)
 		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
+	else if (msi_ext_dest_id && cfg->dest_apicid < 0x8000)
+		msg->address_lo |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid) >> 3;
 	else
 		WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid));
 }
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index a3038d8deb6a..8b395821cb8d 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -110,6 +110,7 @@ struct x86_init_ops x86_init __initdata = {
 		.init_platform		= x86_init_noop,
 		.guest_late_init	= x86_init_noop,
 		.x2apic_available	= bool_x86_init_noop,
+		.msi_ext_dest_id	= bool_x86_init_noop,
 		.init_mem_mapping	= x86_init_noop,
 		.init_after_bootmem	= x86_init_noop,
 	},
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-07 12:20 ` [PATCH 1/5] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
                     ` (2 preceding siblings ...)
  2020-10-07 12:20   ` [PATCH 4/5] x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available David Woodhouse
@ 2020-10-07 12:20   ` David Woodhouse
  2020-10-08 12:05     ` Thomas Gleixner
  2020-10-08 11:46   ` [PATCH 1/5] x86/apic: Fix x2apic enablement without interrupt remapping Thomas Gleixner
  4 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-07 12:20 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel

From: David Woodhouse <dwmw@amazon.co.uk>

This allows the host to indicate that IOAPIC and MSI emulation supports
15-bit destination IDs, allowing up to 32768 CPUs without interrupt
remapping.

cf. https://patchwork.kernel.org/patch/11816693/ for qemu

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/virt/kvm/cpuid.rst     | 4 ++++
 arch/x86/include/uapi/asm/kvm_para.h | 1 +
 arch/x86/kernel/kvm.c                | 6 ++++++
 3 files changed, 11 insertions(+)

diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index a7dff9186bed..1726b5925d2b 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -92,6 +92,10 @@ KVM_FEATURE_ASYNC_PF_INT          14          guest checks this feature bit
                                               async pf acknowledgment msr
                                               0x4b564d07.
 
+KVM_FEATURE_MSI_EXT_DEST_ID       15          guest checks this feature bit
+                                              before using extended destination
+                                              ID bits in MSI address bits 11-5.
+
 KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24          host will warn if no guest-side
                                               per-cpu warps are expeced in
                                               kvmclock
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 812e9b4c1114..950afebfba88 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -32,6 +32,7 @@
 #define KVM_FEATURE_POLL_CONTROL	12
 #define KVM_FEATURE_PV_SCHED_YIELD	13
 #define KVM_FEATURE_ASYNC_PF_INT	14
+#define KVM_FEATURE_MSI_EXT_DEST_ID	15
 
 #define KVM_HINTS_REALTIME      0
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 1b51b727b140..4986b4399aef 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -743,12 +743,18 @@ static void __init kvm_init_platform(void)
 	x86_platform.apic_post_init = kvm_apic_init;
 }
 
+static bool __init kvm_msi_ext_dest_id(void)
+{
+	return kvm_para_has_feature(KVM_FEATURE_MSI_EXT_DEST_ID);
+}
+
 const __initconst struct hypervisor_x86 x86_hyper_kvm = {
 	.name			= "KVM",
 	.detect			= kvm_detect,
 	.type			= X86_HYPER_KVM,
 	.init.guest_late_init	= kvm_guest_init,
 	.init.x2apic_available	= kvm_para_available,
+	.init.msi_ext_dest_id	= kvm_msi_ext_dest_id,
 	.init.init_platform	= kvm_init_platform,
 };
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* Re: [PATCH 3/5] x86/ioapic: Handle Extended Destination ID field in RTE
  2020-10-07 12:20   ` [PATCH 3/5] x86/ioapic: Handle Extended Destination ID field in RTE David Woodhouse
@ 2020-10-08  9:12     ` Peter Zijlstra
  2020-10-08 17:05       ` David Woodhouse
  2020-10-08 11:41     ` Thomas Gleixner
  1 sibling, 1 reply; 192+ messages in thread
From: Peter Zijlstra @ 2020-10-08  9:12 UTC (permalink / raw)
  To: David Woodhouse; +Cc: x86, kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel

On Wed, Oct 07, 2020 at 01:20:44PM +0100, David Woodhouse wrote:
> @@ -1861,7 +1863,8 @@ static void ioapic_configure_entry(struct irq_data *irqd)
>  	 * ioapic chip to verify that.
>  	 */
>  	if (irqd->chip == &ioapic_chip) {
> -		mpd->entry.dest = cfg->dest_apicid;
> +		mpd->entry.dest = cfg->dest_apicid & 0xff;
> +		mpd->entry.ext_dest = cfg->dest_apicid >> 8;
>  		mpd->entry.vector = cfg->vector;
>  	}
>  	for_each_irq_pin(entry, mpd->irq_2_pin)

All the other sites did memset(0) before the assignment, and this the
extra unconditional write of 0 to ext_dest is harmless.

This might be true for this site too, but it wasn't immediately obvious.

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH 3/5] x86/ioapic: Handle Extended Destination ID field in RTE
  2020-10-07 12:20   ` [PATCH 3/5] x86/ioapic: Handle Extended Destination ID field in RTE David Woodhouse
  2020-10-08  9:12     ` Peter Zijlstra
@ 2020-10-08 11:41     ` Thomas Gleixner
  1 sibling, 0 replies; 192+ messages in thread
From: Thomas Gleixner @ 2020-10-08 11:41 UTC (permalink / raw)
  To: David Woodhouse, x86; +Cc: kvm, Paolo Bonzini, linux-kernel

On Wed, Oct 07 2020 at 13:20, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> The IOAPIC Redirection Table Entries contain an 8-bit Extended
> Destination ID field which maps to bits 11-4 of the MSI address.
>
> The lowest bit is used to indicate remappable format, when interrupt
> remapping is in use. A hypervisor can use the other 7 bits to permit
> guests to address up to 15 bits of APIC IDs, thus allowing 32768 vCPUs
> before having to expose a vIOMMU and interrupt remapping to the guest.
>
> No behavioural change in this patch, since nothing yet permits APIC IDs
> above 255 to be used with the non-IR IOAPIC domain. Except for the case
> where IR is enabled but there are IOAPICs which aren't in the scope of
> any IOMMU, which is totally hosed anyway and needs fixing independently
> of this change.

Again: IOAPICs which are not covered by IR are detected and prevent IR
enablement which makes the above a fairy tale. Changelogs are about
facts.

> diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
> index a1a26f6d3aa4..e65a0b7379d0 100644
> --- a/arch/x86/include/asm/io_apic.h
> +++ b/arch/x86/include/asm/io_apic.h
> @@ -78,7 +78,8 @@ struct IO_APIC_route_entry {
>  		mask		:  1,	/* 0: enabled, 1: disabled */
>  		__reserved_2	: 15;
>  
> -	__u32	__reserved_3	: 24,
> +	__u32	__reserved_3	: 17,
> +		ext_dest	:  7,

This wants to be explicitely named 'virt_ext_dest'.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH 1/5] x86/apic: Fix x2apic enablement without interrupt remapping
  2020-10-07 12:20 ` [PATCH 1/5] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
                     ` (3 preceding siblings ...)
  2020-10-07 12:20   ` [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID David Woodhouse
@ 2020-10-08 11:46   ` Thomas Gleixner
  4 siblings, 0 replies; 192+ messages in thread
From: Thomas Gleixner @ 2020-10-08 11:46 UTC (permalink / raw)
  To: David Woodhouse, x86; +Cc: kvm, Paolo Bonzini, linux-kernel

On Wed, Oct 07 2020 at 13:20, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>  
>  static struct apic apic_x2apic_phys;
> +static u32 x2apic_max_apicid;

__ro_after_init?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH 4/5] x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available
  2020-10-07 12:20   ` [PATCH 4/5] x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available David Woodhouse
@ 2020-10-08 11:54     ` Thomas Gleixner
  2020-10-08 12:02       ` Thomas Gleixner
  2020-10-08 13:00       ` David Woodhouse
  0 siblings, 2 replies; 192+ messages in thread
From: Thomas Gleixner @ 2020-10-08 11:54 UTC (permalink / raw)
  To: David Woodhouse, x86; +Cc: kvm, Paolo Bonzini, linux-kernel

On Wed, Oct 07 2020 at 13:20, David Woodhouse wrote:
>  
> +	/*
> +	 * If the hypervisor supports extended destination ID in IOAPIC
> +	 * and MSI, that increases the maximum APIC ID that can be used
> +	 * for non-remapped IRQ domains.
> +	 */
> +	if (x86_init.hyper.msi_ext_dest_id()) {
> +		msi_ext_dest_id = 1;
> +		apic_limit = 32767;
> +	}

This needs to be outside of the remap mode check because?

> +
>  	if (remap_mode != IRQ_REMAP_X2APIC_MODE) {
>  		/*
>  		 * Using X2APIC without IR is not architecturally supported
> @@ -1856,9 +1868,10 @@ static __init void try_to_enable_x2apic(int remap_mode)
>  		 * in physical mode, and CPUs with an APIC ID that cannnot
>  		 * be addressed must not be brought online.
>  		 */
> -		x2apic_set_max_apicid(255);
> +		x2apic_set_max_apicid(apic_limit);
>  		x2apic_phys = 1;
>  	}
> +
>  	x2apic_enable();
>  }
>  
> diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
> index 2825e003259c..85206f971284 100644
> --- a/arch/x86/kernel/apic/msi.c
> +++ b/arch/x86/kernel/apic/msi.c
> @@ -23,8 +23,11 @@
>  
>  struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
>  
> +int msi_ext_dest_id __ro_after_init;

bool please.

Aside of that this breaks the build for a kernel with CONFIG_PCI_MSI=n

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH 4/5] x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available
  2020-10-08 11:54     ` Thomas Gleixner
@ 2020-10-08 12:02       ` Thomas Gleixner
  2020-10-08 13:00       ` David Woodhouse
  1 sibling, 0 replies; 192+ messages in thread
From: Thomas Gleixner @ 2020-10-08 12:02 UTC (permalink / raw)
  To: David Woodhouse, x86; +Cc: kvm, Paolo Bonzini, linux-kernel

On Thu, Oct 08 2020 at 13:54, Thomas Gleixner wrote:
> On Wed, Oct 07 2020 at 13:20, David Woodhouse wrote:
>> diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
>> index 2825e003259c..85206f971284 100644
>> --- a/arch/x86/kernel/apic/msi.c
>> +++ b/arch/x86/kernel/apic/msi.c
>> @@ -23,8 +23,11 @@
>>  
>>  struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
>>  
>> +int msi_ext_dest_id __ro_after_init;
>
> bool please.
>
> Aside of that this breaks the build for a kernel with CONFIG_PCI_MSI=n

So this wants to be

bool virt_ext_dest_id __ro_after_init;

in apic.c and then please make the IO/APIC places depend on this as well
so any change to the utilization of the reserved IO/APIC bits in the
future is not going to end up in surprises.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-07 12:20   ` [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID David Woodhouse
@ 2020-10-08 12:05     ` Thomas Gleixner
  2020-10-08 12:55       ` David Woodhouse
  0 siblings, 1 reply; 192+ messages in thread
From: Thomas Gleixner @ 2020-10-08 12:05 UTC (permalink / raw)
  To: David Woodhouse, x86; +Cc: kvm, Paolo Bonzini, linux-kernel

On Wed, Oct 07 2020 at 13:20, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> This allows the host to indicate that IOAPIC and MSI emulation supports
> 15-bit destination IDs, allowing up to 32768 CPUs without interrupt
> remapping.
>
> cf. https://patchwork.kernel.org/patch/11816693/ for qemu
>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  Documentation/virt/kvm/cpuid.rst     | 4 ++++
>  arch/x86/include/uapi/asm/kvm_para.h | 1 +
>  arch/x86/kernel/kvm.c                | 6 ++++++
>  3 files changed, 11 insertions(+)
>
> diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
> index a7dff9186bed..1726b5925d2b 100644
> --- a/Documentation/virt/kvm/cpuid.rst
> +++ b/Documentation/virt/kvm/cpuid.rst
> @@ -92,6 +92,10 @@ KVM_FEATURE_ASYNC_PF_INT          14          guest checks this feature bit
>                                                async pf acknowledgment msr
>                                                0x4b564d07.
>  
> +KVM_FEATURE_MSI_EXT_DEST_ID       15          guest checks this feature bit
> +                                              before using extended destination
> +                                              ID bits in MSI address
> bits 11-5.

Why MSI_EXT_DEST_ID? It's enabling that for MSI and IO/APIC. The
underlying mechanism might be the same, but APIC_EXT_DEST_ID is more
general and then you might also make the explanation of that bit match
the changelog.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-08 12:05     ` Thomas Gleixner
@ 2020-10-08 12:55       ` David Woodhouse
  2020-10-08 16:08         ` David Woodhouse
  0 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-08 12:55 UTC (permalink / raw)
  To: Thomas Gleixner, x86; +Cc: kvm, Paolo Bonzini, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1819 bytes --]

On Thu, 2020-10-08 at 14:05 +0200, Thomas Gleixner wrote:
> Why MSI_EXT_DEST_ID? It's enabling that for MSI and IO/APIC. The
> underlying mechanism might be the same, but APIC_EXT_DEST_ID is more
> general and then you might also make the explanation of that bit
> match the changelog.

It's enabling it for *everything* that generates MSI cycles — which
includes IOAPIC, HPET, and MSI-capable PCI devices. Hell, and anything
else which feels like generating a physical address cycle to 0xFEExxxxx
addresses.

Again, the IOAPIC is just a device for turning pin-based interrupts
into MSIs. Bits 19-4 of the address it writes to are taken directly
from bits 63-48 of the IOAPIC RTE. There's complexity elsewhere but for
*those* bits, It just uses the bits it's given, just like a PCI device
or an HPET would. 

When I implemented this in qemu I didn't even *touch* the IOAPIC
support; it doesn't affect the IOAPIC at all, just like it doesn't
affect the HPET or any of the MSI-capable PCI devices that qemu
emulates. They just put the bits on the bus that they're told to, when
they want to generate an interrupt.

This feature is an MSI feature.

Not an HPET feature.

Not a PCI feature.

Not an IOAPIC feature.

The fact that a *Linux* guest has special-case knowledge in its IOAPIC
driver that duplicates what the MSI message composition does, is not a
good justification for naming the feature bit bizarrely.

In fact I'm really tempted to make Linux's io_apic.c just use
irq_chip_compose_msi_msg() and swizzle the bits out of the message
identically for IR and non-IR alike (modulo the pin hack), and delete
the IR_IO_APIC_route_entry struct completely. 

That also completely removes the ext_dest_id trick from visibility in
io_apic.c. And might avoid further confusion.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH 4/5] x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available
  2020-10-08 11:54     ` Thomas Gleixner
  2020-10-08 12:02       ` Thomas Gleixner
@ 2020-10-08 13:00       ` David Woodhouse
  1 sibling, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-08 13:00 UTC (permalink / raw)
  To: Thomas Gleixner, x86; +Cc: kvm, Paolo Bonzini, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1713 bytes --]

On Thu, 2020-10-08 at 13:54 +0200, Thomas Gleixner wrote:
> On Wed, Oct 07 2020 at 13:20, David Woodhouse wrote:
> >  
> > +	/*
> > +	 * If the hypervisor supports extended destination ID in IOAPIC
> > +	 * and MSI, that increases the maximum APIC ID that can be used
> > +	 * for non-remapped IRQ domains.
> > +	 */
> > +	if (x86_init.hyper.msi_ext_dest_id()) {
> > +		msi_ext_dest_id = 1;
> > +		apic_limit = 32767;
> > +	}
> 
> This needs to be outside of the remap mode check because?


Once upon a time, there was a later patch in the series which *also*
used the apic_limit variable to generate a maximum affinity mask.

Now we've ditched that idea, I can put this back inside the remap mode
check.

> 
> > +
> >  	if (remap_mode != IRQ_REMAP_X2APIC_MODE) {
> >  		/*
> >  		 * Using X2APIC without IR is not architecturally supported
> > @@ -1856,9 +1868,10 @@ static __init void try_to_enable_x2apic(int remap_mode)
> >  		 * in physical mode, and CPUs with an APIC ID that cannnot
> >  		 * be addressed must not be brought online.
> >  		 */
> > -		x2apic_set_max_apicid(255);
> > +		x2apic_set_max_apicid(apic_limit);
> >  		x2apic_phys = 1;
> >  	}
> > +
> >  	x2apic_enable();
> >  }
> >  
> > diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
> > index 2825e003259c..85206f971284 100644
> > --- a/arch/x86/kernel/apic/msi.c
> > +++ b/arch/x86/kernel/apic/msi.c
> > @@ -23,8 +23,11 @@
> >  
> >  struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
> >  
> > +int msi_ext_dest_id __ro_after_init;
> 
> bool please.
> 
> Aside of that this breaks the build for a kernel with CONFIG_PCI_MSI=n

Will fix (and rename).


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-08 12:55       ` David Woodhouse
@ 2020-10-08 16:08         ` David Woodhouse
  2020-10-08 21:14           ` Thomas Gleixner
  0 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-08 16:08 UTC (permalink / raw)
  To: Thomas Gleixner, x86; +Cc: kvm, Paolo Bonzini, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 13159 bytes --]

On Thu, 2020-10-08 at 13:55 +0100, David Woodhouse wrote:
> 
> In fact I'm really tempted to make Linux's io_apic.c just use
> irq_chip_compose_msi_msg() and swizzle the bits out of the message
> identically for IR and non-IR alike (modulo the pin hack), and delete
> the IR_IO_APIC_route_entry struct completely. 
> 
> That also completely removes the ext_dest_id trick from visibility in
> io_apic.c. And might avoid further confusion.

That looks a bit like this, FWIW. Turns out it doesn't *entirely*
remove the ext_dest_id trick from io_apic.c because it does still need
to put together an entry manually for the ExtINT hackery and for
restoring boot mode. But it does remove a lot of incestuous ioapic
hackery from IOMMU drivers, and deletes more code than it adds.

It has the additional effect of enabling the WARN_ON for unreachable
APIC IDs in __irq_msi_compose_msg(), for IOAPIC interrupts too.

(We'd want the x86_vector_domain to actually have an MSI compose
function in the !CONFIG_PCI_MSI case if we did this, of course.)

From 2fbc79588d4677ee1cc9df661162fcf1a7da57f0 Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw@amazon.co.uk>
Date: Thu, 8 Oct 2020 15:44:42 +0100
Subject: [PATCH 6/5] x86/ioapic: Generate RTE directly from parent irqchip's MSI
 message

The IOAPIC generates an MSI cycle with address/data bits taken from its
Redirection Table Entry in some combination which used to make sense,
but now is just a bunch of bits which get passed through in some
seemingly arbitrary order.

Instead of making IRQ remapping drivers directly frob the IOAPIC RTE,
let them just do their job and generate an MSI message. The bit
swizzling to turn that MSI message into the IOAPIC's RTE is the same in
all cases, since it's a function of the IOAPIC hardware. The IRQ
remappers have no real need to get involved with that.

The only slight caveat is that the IOAPIC is interpreting some of
those fields too, and it does want the 'vector' field to be unique
to make EOI work. The AMD IOMMU happens to put its IRTE index in the
bits that the IOAPIC thinks are the vector field, and accommodates
this requirement by reserving the first 32 indices for the IOAPIC.
The Intel IOMMU doesn't actually use the bits that the IOAPIC thinks
are the vector field, so it fills in the 'pin' value there instead.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/hw_irq.h       | 11 +++---
 arch/x86/include/asm/msidef.h       |  2 ++
 arch/x86/kernel/apic/io_apic.c      | 55 ++++++++++++++++++-----------
 drivers/iommu/amd/iommu.c           | 14 --------
 drivers/iommu/hyperv-iommu.c        | 31 ----------------
 drivers/iommu/intel/irq_remapping.c | 19 +++-------
 6 files changed, 46 insertions(+), 86 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index a4aeeaace040..aabd8f1b6bb0 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -45,12 +45,11 @@ enum irq_alloc_type {
 };
 
 struct ioapic_alloc_info {
-	int				pin;
-	int				node;
-	u32				trigger : 1;
-	u32				polarity : 1;
-	u32				valid : 1;
-	struct IO_APIC_route_entry	*entry;
+	int		pin;
+	int		node;
+	u32		trigger : 1;
+	u32		polarity : 1;
+	u32		valid : 1;
 };
 
 struct uv_alloc_info {
diff --git a/arch/x86/include/asm/msidef.h b/arch/x86/include/asm/msidef.h
index ee2f8ccc32d0..37c3d2d492c9 100644
--- a/arch/x86/include/asm/msidef.h
+++ b/arch/x86/include/asm/msidef.h
@@ -18,6 +18,7 @@
 #define MSI_DATA_DELIVERY_MODE_SHIFT	8
 #define  MSI_DATA_DELIVERY_FIXED	(0 << MSI_DATA_DELIVERY_MODE_SHIFT)
 #define  MSI_DATA_DELIVERY_LOWPRI	(1 << MSI_DATA_DELIVERY_MODE_SHIFT)
+#define  MSI_DATA_DELIVERY_MODE_MASK	(3 << MSI_DATA_DELIVERY_MODE_SHIFT)
 
 #define MSI_DATA_LEVEL_SHIFT		14
 #define	 MSI_DATA_LEVEL_DEASSERT	(0 << MSI_DATA_LEVEL_SHIFT)
@@ -37,6 +38,7 @@
 #define MSI_ADDR_DEST_MODE_SHIFT	2
 #define  MSI_ADDR_DEST_MODE_PHYSICAL	(0 << MSI_ADDR_DEST_MODE_SHIFT)
 #define	 MSI_ADDR_DEST_MODE_LOGICAL	(1 << MSI_ADDR_DEST_MODE_SHIFT)
+#define  MSI_ADDR_DEST_MODE_MASK	(1 << MSI_DATA_DELIVERY_MODE_SHIFT)
 
 #define MSI_ADDR_REDIRECTION_SHIFT	3
 #define  MSI_ADDR_REDIRECTION_CPU	(0 << MSI_ADDR_REDIRECTION_SHIFT)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 54f6a029b1d1..ca2da19d5c55 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -48,6 +48,7 @@
 #include <linux/jiffies.h>	/* time_after() */
 #include <linux/slab.h>
 #include <linux/memblock.h>
+#include <linux/msi.h>
 
 #include <asm/irqdomain.h>
 #include <asm/io.h>
@@ -63,6 +64,7 @@
 #include <asm/setup.h>
 #include <asm/irq_remapping.h>
 #include <asm/hw_irq.h>
+#include <asm/msidef.h>
 
 #include <asm/apic.h>
 
@@ -1851,22 +1853,36 @@ static void ioapic_ir_ack_level(struct irq_data *irq_data)
 	eoi_ioapic_pin(data->entry.vector, data);
 }
 
+static void mp_swizzle_msi_dest_bits(struct irq_data *irq_data, void *_entry)
+{
+	struct msi_msg msg;
+	u32 *entry = _entry;
+
+	irq_chip_compose_msi_msg(irq_data, &msg);
+
+	/*
+	 * They're in a bit of a random order for historical reasons, but
+	 * the IO/APIC is just a device for turning interrupt lines into
+	 * MSIs, and various bits of the MSI addr/data are just swizzled
+	 * into/from the bits of Redirection Table Entry.
+	 */
+	entry[0] &= 0xfffff000;
+	entry[0] |= (msg.data & (MSI_DATA_DELIVERY_MODE_MASK |
+				 MSI_DATA_VECTOR_MASK));
+	entry[0] |= (msg.address_lo & MSI_ADDR_DEST_MODE_MASK) << 9;
+
+	entry[1] &= 0xffff;
+	entry[1] |= (msg.address_lo & MSI_ADDR_DEST_ID_MASK) << 12;
+}
+
+
 static void ioapic_configure_entry(struct irq_data *irqd)
 {
 	struct mp_chip_data *mpd = irqd->chip_data;
-	struct irq_cfg *cfg = irqd_cfg(irqd);
 	struct irq_pin_list *entry;
 
-	/*
-	 * Only update when the parent is the vector domain, don't touch it
-	 * if the parent is the remapping domain. Check the installed
-	 * ioapic chip to verify that.
-	 */
-	if (irqd->chip == &ioapic_chip) {
-		mpd->entry.dest = cfg->dest_apicid & 0xff;
-		mpd->entry.virt_ext_dest = cfg->dest_apicid >> 8;
-		mpd->entry.vector = cfg->vector;
-	}
+	mp_swizzle_msi_dest_bits(irqd, &mpd->entry);
+
 	for_each_irq_pin(entry, mpd->irq_2_pin)
 		__ioapic_write_entry(entry->apic, entry->pin, mpd->entry);
 }
@@ -2949,15 +2965,14 @@ static void mp_irqdomain_get_attr(u32 gsi, struct mp_chip_data *data,
 	}
 }
 
-static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
-			   struct IO_APIC_route_entry *entry)
+static void mp_setup_entry(struct irq_data *irq_data, struct mp_chip_data *data)
 {
+	struct IO_APIC_route_entry *entry = &data->entry;
+
 	memset(entry, 0, sizeof(*entry));
-	entry->delivery_mode = apic->irq_delivery_mode;
-	entry->dest_mode     = apic->irq_dest_mode;
-	entry->dest	     = cfg->dest_apicid & 0xff;
-	entry->virt_ext_dest = cfg->dest_apicid >> 8;
-	entry->vector	     = cfg->vector;
+
+	mp_swizzle_msi_dest_bits(irq_data, entry);
+
 	entry->trigger	     = data->trigger;
 	entry->polarity	     = data->polarity;
 	/*
@@ -2995,7 +3010,6 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
 	if (!data)
 		return -ENOMEM;
 
-	info->ioapic.entry = &data->entry;
 	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, info);
 	if (ret < 0) {
 		kfree(data);
@@ -3013,8 +3027,7 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
 	add_pin_to_irq_node(data, ioapic_alloc_attr_node(info), ioapic, pin);
 
 	local_irq_save(flags);
-	if (info->ioapic.entry)
-		mp_setup_entry(cfg, data, info->ioapic.entry);
+	mp_setup_entry(irq_data, data);
 	mp_register_handler(virq, data->trigger);
 	if (virq < nr_legacy_irqs())
 		legacy_pic->mask(virq);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index ef64e01f66d7..13d0a8f42d56 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3597,7 +3597,6 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 {
 	struct irq_2_irte *irte_info = &data->irq_2_irte;
 	struct msi_msg *msg = &data->msi_entry;
-	struct IO_APIC_route_entry *entry;
 	struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
 
 	if (!iommu)
@@ -3611,19 +3610,6 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC:
-		/* Setup IOAPIC entry */
-		entry = info->ioapic.entry;
-		info->ioapic.entry = NULL;
-		memset(entry, 0, sizeof(*entry));
-		entry->vector        = index;
-		entry->mask          = 0;
-		entry->trigger       = info->ioapic.trigger;
-		entry->polarity      = info->ioapic.polarity;
-		/* Mask level triggered irqs. */
-		if (info->ioapic.trigger)
-			entry->mask = 1;
-		break;
-
 	case X86_IRQ_ALLOC_TYPE_HPET:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e09e2d734c57..37dd485a5640 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -40,7 +40,6 @@ static int hyperv_ir_set_affinity(struct irq_data *data,
 {
 	struct irq_data *parent = data->parent_data;
 	struct irq_cfg *cfg = irqd_cfg(data);
-	struct IO_APIC_route_entry *entry;
 	int ret;
 
 	/* Return error If new irq affinity is out of ioapic_max_cpumask. */
@@ -51,9 +50,6 @@ static int hyperv_ir_set_affinity(struct irq_data *data,
 	if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
 		return ret;
 
-	entry = data->chip_data;
-	entry->dest = cfg->dest_apicid;
-	entry->vector = cfg->vector;
 	send_cleanup_vector(cfg);
 
 	return 0;
@@ -89,20 +85,6 @@ static int hyperv_irq_remapping_alloc(struct irq_domain *domain,
 
 	irq_data->chip = &hyperv_ir_chip;
 
-	/*
-	 * If there is interrupt remapping function of IOMMU, setting irq
-	 * affinity only needs to change IRTE of IOMMU. But Hyper-V doesn't
-	 * support interrupt remapping function, setting irq affinity of IO-APIC
-	 * interrupts still needs to change IO-APIC registers. But ioapic_
-	 * configure_entry() will ignore value of cfg->vector and cfg->
-	 * dest_apicid when IO-APIC's parent irq domain is not the vector
-	 * domain.(See ioapic_configure_entry()) In order to setting vector
-	 * and dest_apicid to IO-APIC register, IO-APIC entry pointer is saved
-	 * in the chip_data and hyperv_irq_remapping_activate()/hyperv_ir_set_
-	 * affinity() set vector and dest_apicid directly into IO-APIC entry.
-	 */
-	irq_data->chip_data = info->ioapic.entry;
-
 	/*
 	 * Hypver-V IO APIC irq affinity should be in the scope of
 	 * ioapic_max_cpumask because no irq remapping support.
@@ -119,22 +101,9 @@ static void hyperv_irq_remapping_free(struct irq_domain *domain,
 	irq_domain_free_irqs_common(domain, virq, nr_irqs);
 }
 
-static int hyperv_irq_remapping_activate(struct irq_domain *domain,
-			  struct irq_data *irq_data, bool reserve)
-{
-	struct irq_cfg *cfg = irqd_cfg(irq_data);
-	struct IO_APIC_route_entry *entry = irq_data->chip_data;
-
-	entry->dest = cfg->dest_apicid;
-	entry->vector = cfg->vector;
-
-	return 0;
-}
-
 static const struct irq_domain_ops hyperv_ir_domain_ops = {
 	.alloc = hyperv_irq_remapping_alloc,
 	.free = hyperv_irq_remapping_free,
-	.activate = hyperv_irq_remapping_activate,
 };
 
 static int __init hyperv_prepare_irq_remapping(void)
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 0cfce1d3b7bb..511dfb4884bc 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1265,7 +1265,6 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 					     struct irq_alloc_info *info,
 					     int index, int sub_handle)
 {
-	struct IR_IO_APIC_route_entry *entry;
 	struct irte *irte = &data->irte_entry;
 	struct msi_msg *msg = &data->msi_entry;
 
@@ -1281,23 +1280,15 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 			irte->avail, irte->vector, irte->dest_id,
 			irte->sid, irte->sq, irte->svt);
 
-		entry = (struct IR_IO_APIC_route_entry *)info->ioapic.entry;
-		info->ioapic.entry = NULL;
-		memset(entry, 0, sizeof(*entry));
-		entry->index2	= (index >> 15) & 0x1;
-		entry->zero	= 0;
-		entry->format	= 1;
-		entry->index	= (index & 0x7fff);
 		/*
 		 * IO-APIC RTE will be configured with virtual vector.
 		 * irq handler will do the explicit EOI to the io-apic.
 		 */
-		entry->vector	= info->ioapic.pin;
-		entry->mask	= 0;			/* enable IRQ */
-		entry->trigger	= info->ioapic.trigger;
-		entry->polarity	= info->ioapic.polarity;
-		if (info->ioapic.trigger)
-			entry->mask = 1; /* Mask level triggered irqs. */
+		msg->data = info->ioapic.pin;
+		msg->address_hi = MSI_ADDR_BASE_HI;
+		msg->address_lo = MSI_ADDR_BASE_LO | MSI_ADDR_IR_EXT_INT |
+				  MSI_ADDR_IR_INDEX1(index) |
+				  MSI_ADDR_IR_INDEX2(index);
 		break;
 
 	case X86_IRQ_ALLOC_TYPE_HPET:
-- 
2.17.1







[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* Re: [PATCH 3/5] x86/ioapic: Handle Extended Destination ID field in RTE
  2020-10-08  9:12     ` Peter Zijlstra
@ 2020-10-08 17:05       ` David Woodhouse
  0 siblings, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-08 17:05 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: x86, kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1326 bytes --]

On Thu, 2020-10-08 at 11:12 +0200, Peter Zijlstra wrote:
> On Wed, Oct 07, 2020 at 01:20:44PM +0100, David Woodhouse wrote:
> > @@ -1861,7 +1863,8 @@ static void ioapic_configure_entry(struct irq_data *irqd)
> >  	 * ioapic chip to verify that.
> >  	 */
> >  	if (irqd->chip == &ioapic_chip) {
> > -		mpd->entry.dest = cfg->dest_apicid;
> > +		mpd->entry.dest = cfg->dest_apicid & 0xff;
> > +		mpd->entry.ext_dest = cfg->dest_apicid >> 8;
> >  		mpd->entry.vector = cfg->vector;
> >  	}
> >  	for_each_irq_pin(entry, mpd->irq_2_pin)
> 
> All the other sites did memset(0) before the assignment, and this the
> extra unconditional write of 0 to ext_dest is harmless.
> 
> This might be true for this site too, but it wasn't immediately obvious.

Yeah. Really, that whole 16-bit field ought to be called
'msi_addr_bits_19_to_4' since there's no interpretation by the IOAPIC
and it's *just* passed on in the MSI cycle. See my later patch which
stops making them up for ourselves in the IOAPIC code and *just*
swizzles them out of the MSI message which is composed for us by
upstream.

In this case, since this piece of code is based on the knowledge that
the upstream controller is accepting MSI in the x86 Compatibility
Format, those seven bits are never going to have been used for anything
else.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-08 16:08         ` David Woodhouse
@ 2020-10-08 21:14           ` Thomas Gleixner
  2020-10-08 21:39             ` David Woodhouse
  0 siblings, 1 reply; 192+ messages in thread
From: Thomas Gleixner @ 2020-10-08 21:14 UTC (permalink / raw)
  To: David Woodhouse, x86; +Cc: kvm, Paolo Bonzini, linux-kernel

On Thu, Oct 08 2020 at 17:08, David Woodhouse wrote:
> On Thu, 2020-10-08 at 13:55 +0100, David Woodhouse wrote:
>
> (We'd want the x86_vector_domain to actually have an MSI compose
> function in the !CONFIG_PCI_MSI case if we did this, of course.)

The compose function and the vector domain wrapper can simply move to vector.c

> From 2fbc79588d4677ee1cc9df661162fcf1a7da57f0 Mon Sep 17 00:00:00 2001
> From: David Woodhouse <dwmw@amazon.co.uk>
> Date: Thu, 8 Oct 2020 15:44:42 +0100
> Subject: [PATCH 6/5] x86/ioapic: Generate RTE directly from parent irqchip's MSI
>  message
>
> The IOAPIC generates an MSI cycle with address/data bits taken from its
> Redirection Table Entry in some combination which used to make sense,
> but now is just a bunch of bits which get passed through in some
> seemingly arbitrary order.
>
> Instead of making IRQ remapping drivers directly frob the IOAPIC RTE,
> let them just do their job and generate an MSI message. The bit
> swizzling to turn that MSI message into the IOAPIC's RTE is the same in
> all cases, since it's a function of the IOAPIC hardware. The IRQ
> remappers have no real need to get involved with that.
>
> The only slight caveat is that the IOAPIC is interpreting some of
> those fields too, and it does want the 'vector' field to be unique
> to make EOI work. The AMD IOMMU happens to put its IRTE index in the
> bits that the IOAPIC thinks are the vector field, and accommodates
> this requirement by reserving the first 32 indices for the IOAPIC.
> The Intel IOMMU doesn't actually use the bits that the IOAPIC thinks
> are the vector field, so it fills in the 'pin' value there instead.
>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  arch/x86/include/asm/hw_irq.h       | 11 +++---
>  arch/x86/include/asm/msidef.h       |  2 ++
>  arch/x86/kernel/apic/io_apic.c      | 55 ++++++++++++++++++-----------
>  drivers/iommu/amd/iommu.c           | 14 --------
>  drivers/iommu/hyperv-iommu.c        | 31 ----------------
>  drivers/iommu/intel/irq_remapping.c | 19 +++-------
>  6 files changed, 46 insertions(+), 86 deletions(-)

Nice :)

> +static void mp_swizzle_msi_dest_bits(struct irq_data *irq_data, void *_entry)
> +{
> +	struct msi_msg msg;
> +	u32 *entry = _entry;
> +
> +	irq_chip_compose_msi_msg(irq_data, &msg);

Duh, for some stupid reason it never occured to me to do it that
way.

Historically the MSI compose function was part of the MSI PCI chip and I
just changed that recently when I reworked the code to make IMS support
possible.

Historical blinders are pretty sticky :(

> +	/*
> +	 * They're in a bit of a random order for historical reasons, but
> +	 * the IO/APIC is just a device for turning interrupt lines into
> +	 * MSIs, and various bits of the MSI addr/data are just swizzled
> +	 * into/from the bits of Redirection Table Entry.
> +	 */
> +	entry[0] &= 0xfffff000;
> +	entry[0] |= (msg.data & (MSI_DATA_DELIVERY_MODE_MASK |
> +				 MSI_DATA_VECTOR_MASK));
> +	entry[0] |= (msg.address_lo & MSI_ADDR_DEST_MODE_MASK) << 9;
> +
> +	entry[1] &= 0xffff;
> +	entry[1] |= (msg.address_lo & MSI_ADDR_DEST_ID_MASK) << 12;
> +}

....

>  	switch (info->type) {
>  	case X86_IRQ_ALLOC_TYPE_IOAPIC:
> -		/* Setup IOAPIC entry */
> -		entry = info->ioapic.entry;
> -		info->ioapic.entry = NULL;
> -		memset(entry, 0, sizeof(*entry));
> -		entry->vector        = index;
> -		entry->mask          = 0;
> -		entry->trigger       = info->ioapic.trigger;
> -		entry->polarity      = info->ioapic.polarity;
> -		/* Mask level triggered irqs. */
> -		if (info->ioapic.trigger)
> -			entry->mask = 1;
> -		break;
> -

Thanks for cleaning this up!

       tglx


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-08 21:14           ` Thomas Gleixner
@ 2020-10-08 21:39             ` David Woodhouse
  2020-10-08 23:27               ` Thomas Gleixner
  0 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-08 21:39 UTC (permalink / raw)
  To: Thomas Gleixner, x86; +Cc: kvm, Paolo Bonzini, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 682 bytes --]

On Thu, 2020-10-08 at 23:14 +0200, Thomas Gleixner wrote:
> On Thu, Oct 08 2020 at 17:08, David Woodhouse wrote:
> > On Thu, 2020-10-08 at 13:55 +0100, David Woodhouse wrote:
> > 
> > (We'd want the x86_vector_domain to actually have an MSI compose
> > function in the !CONFIG_PCI_MSI case if we did this, of course.)
> 
> The compose function and the vector domain wrapper can simply move to
> vector.c

I ended up putting __irq_msi_compose_msg() into apic.c and that way I
can make virt_ext_dest_id static in that file.

And then I can move all the HPET-MSI support into hpet.c too.

https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/ext_dest_id

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-08 21:39             ` David Woodhouse
@ 2020-10-08 23:27               ` Thomas Gleixner
  2020-10-09  6:07                 ` David Woodhouse
  2020-10-10 10:06                 ` David Woodhouse
  0 siblings, 2 replies; 192+ messages in thread
From: Thomas Gleixner @ 2020-10-08 23:27 UTC (permalink / raw)
  To: David Woodhouse, x86; +Cc: kvm, Paolo Bonzini, linux-kernel

On Thu, Oct 08 2020 at 22:39, David Woodhouse wrote:
> On Thu, 2020-10-08 at 23:14 +0200, Thomas Gleixner wrote:
>> > 
>> > (We'd want the x86_vector_domain to actually have an MSI compose
>> > function in the !CONFIG_PCI_MSI case if we did this, of course.)
>> 
>> The compose function and the vector domain wrapper can simply move to
>> vector.c
>
> I ended up putting __irq_msi_compose_msg() into apic.c and that way I
> can make virt_ext_dest_id static in that file.
>
> And then I can move all the HPET-MSI support into hpet.c too.

Works for me.

> https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/ext_dest_id

For the next submission, can you please

 - pick up the -ENODEV changes for HPET/IOAPIC which I posted earlier

 - shuffle all that compose/IOAPIC cleanup around

before adding that extended dest id stuff.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-08 23:27               ` Thomas Gleixner
@ 2020-10-09  6:07                 ` David Woodhouse
  2020-10-10 10:06                 ` David Woodhouse
  1 sibling, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-09  6:07 UTC (permalink / raw)
  To: Thomas Gleixner, x86; +Cc: kvm, Paolo Bonzini, linux-kernel



On 9 October 2020 00:27:06 BST, Thomas Gleixner <tglx@linutronix.de> wrote:
>On Thu, Oct 08 2020 at 22:39, David Woodhouse wrote:
>> On Thu, 2020-10-08 at 23:14 +0200, Thomas Gleixner wrote:
>>> > 
>>> > (We'd want the x86_vector_domain to actually have an MSI compose
>>> > function in the !CONFIG_PCI_MSI case if we did this, of course.)
>>> 
>>> The compose function and the vector domain wrapper can simply move
>to
>>> vector.c
>>
>> I ended up putting __irq_msi_compose_msg() into apic.c and that way I
>> can make virt_ext_dest_id static in that file.
>>
>> And then I can move all the HPET-MSI support into hpet.c too.
>
>Works for me.
>
>>
>https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/ext_dest_id
>
>For the next submission, can you please
>
> - pick up the -ENODEV changes for HPET/IOAPIC which I posted earlier

Ack.

> - shuffle all that compose/IOAPIC cleanup around

I'd prefer the MSI swizzling bit to stay last in the series. I want to stare hard at the hyperv-iommu part a bit more, and ideally even have it tested. I'd prefer the real functionality that I care about, not to depend on that cleanup.

If it actually let me remove all mention of ext_dest_id from the IOAPIC code and use *only* MSI swizzling, I'd be keener to reorder. But as noted, there are a couple of manual RTE constructions in there still anyway.

I can move __irq_compose_msi_msg() earlier in the series though, and then virt_ext_dest_id can be static in apic.c from its inception.

OK?

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH v2 0/8] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported
  2020-10-07 12:20 [PATCH 0/5] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
  2020-10-07 12:20 ` [PATCH 1/5] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
@ 2020-10-09 10:46 ` David Woodhouse
  2020-10-09 10:46   ` [PATCH v2 1/8] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
                     ` (7 more replies)
  1 sibling, 8 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-09 10:46 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel, linux-hyperv


Fix the conditions for enabling x2apic on guests without interrupt 
remapping, and support 15-bit Extended Destination ID to allow 32768 
CPUs without IR on hypervisors that support it.

The last patch in the series now makes io_apic.c generate its RTE from 
the MSI message created by the parent irqchip, and removes all the nasty 
hackery where IRQ remapping drivers would frob I/OAPIC RTEs for 
themselves directly. It's last because I'd quite like to see it tested 
especially with Hyper-V and it doesn't actually eliminate the need for 
io_apic.c to know about the 15-bit extension anyway.

v2:
 • Minor cleanups.
 • Move __irq_msi_compose_msg() to apic.c, make virt_ext_dest_id static.
 • Generate I/OAPIC RTE directly from parent irqchip's MSI messages.
 • Clean up HPET MSI support into hpet.c now that we can.

David Woodhouse (8):
      x86/apic: Fix x2apic enablement without interrupt remapping
      x86/msi: Only use high bits of MSI address for DMAR unit
      x86/apic: Always provide irq_compose_msi_msg() method for vector domain
      x86/ioapic: Handle Extended Destination ID field in RTE
      x86/apic: Support 15 bits of APIC ID in MSI where available
      x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
      x86/hpet: Move MSI support into hpet.c
      x86/ioapic: Generate RTE directly from parent irqchip's MSI message

 Documentation/virt/kvm/cpuid.rst     |   4 +
 arch/x86/include/asm/apic.h          |   9 +--
 arch/x86/include/asm/hpet.h          |  11 ---
 arch/x86/include/asm/hw_irq.h        |  11 ++-
 arch/x86/include/asm/io_apic.h       |   3 +-
 arch/x86/include/asm/msidef.h        |   2 +
 arch/x86/include/asm/x86_init.h      |   2 +
 arch/x86/include/uapi/asm/kvm_para.h |   1 +
 arch/x86/kernel/apic/apic.c          |  68 ++++++++++++++--
 arch/x86/kernel/apic/io_apic.c       |  66 +++++++++------
 arch/x86/kernel/apic/msi.c           | 152 +++--------------------------------
 arch/x86/kernel/apic/vector.c        |   6 ++
 arch/x86/kernel/apic/x2apic_phys.c   |   9 +++
 arch/x86/kernel/hpet.c               | 116 ++++++++++++++++++++++++--
 arch/x86/kernel/kvm.c                |   6 ++
 arch/x86/kernel/x86_init.c           |   1 +
 drivers/iommu/amd/iommu.c            |  14 ----
 drivers/iommu/hyperv-iommu.c         |  31 -------
 drivers/iommu/intel/irq_remapping.c  |  19 ++---
 19 files changed, 276 insertions(+), 255 deletions(-)


^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH v2 1/8] x86/apic: Fix x2apic enablement without interrupt remapping
  2020-10-09 10:46 ` [PATCH v2 0/8] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
@ 2020-10-09 10:46   ` David Woodhouse
  2020-10-09 10:46   ` [PATCH v2 2/8] x86/msi: Only use high bits of MSI address for DMAR unit David Woodhouse
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-09 10:46 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel, linux-hyperv

From: David Woodhouse <dwmw@amazon.co.uk>

Currently, Linux as a hypervisor guest will enable x2apic only if there
are no CPUs present at boot time with an APIC ID above 255.

Hotplugging a CPU later with a higher APIC ID would result in a CPU
which cannot be targeted by external interrupts.

Add a filter in x2apic_apic_id_valid() which can be used to prevent
such CPUs from coming online, and allow x2apic to be enabled even if
they are present at boot time.

Fixes: ce69a784504 ("x86/apic: Enable x2APIC without interrupt remapping under KVM")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/apic.h        |  1 +
 arch/x86/kernel/apic/apic.c        | 14 ++++++++------
 arch/x86/kernel/apic/x2apic_phys.c |  9 +++++++++
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 1c129abb7f09..b0fd204e0023 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -259,6 +259,7 @@ static inline u64 native_x2apic_icr_read(void)
 
 extern int x2apic_mode;
 extern int x2apic_phys;
+extern void __init x2apic_set_max_apicid(u32 apicid);
 extern void __init check_x2apic(void);
 extern void x2apic_setup(void);
 static inline int x2apic_enabled(void)
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index b3eef1d5c903..113f6ca7b828 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1841,20 +1841,22 @@ static __init void try_to_enable_x2apic(int remap_mode)
 		return;
 
 	if (remap_mode != IRQ_REMAP_X2APIC_MODE) {
-		/* IR is required if there is APIC ID > 255 even when running
-		 * under KVM
+		/*
+		 * Using X2APIC without IR is not architecturally supported
+		 * on bare metal but may be supported in guests.
 		 */
-		if (max_physical_apicid > 255 ||
-		    !x86_init.hyper.x2apic_available()) {
+		if (!x86_init.hyper.x2apic_available()) {
 			pr_info("x2apic: IRQ remapping doesn't support X2APIC mode\n");
 			x2apic_disable();
 			return;
 		}
 
 		/*
-		 * without IR all CPUs can be addressed by IOAPIC/MSI
-		 * only in physical mode
+		 * Without IR, all CPUs can be addressed by IOAPIC/MSI only
+		 * in physical mode, and CPUs with an APIC ID that cannnot
+		 * be addressed must not be brought online.
 		 */
+		x2apic_set_max_apicid(255);
 		x2apic_phys = 1;
 	}
 	x2apic_enable();
diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c
index bc9693841353..e14eae6d6ea7 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -8,6 +8,12 @@
 int x2apic_phys;
 
 static struct apic apic_x2apic_phys;
+static u32 x2apic_max_apicid __ro_after_init;
+
+void __init x2apic_set_max_apicid(u32 apicid)
+{
+	x2apic_max_apicid = apicid;
+}
 
 static int __init set_x2apic_phys_mode(char *arg)
 {
@@ -98,6 +104,9 @@ static int x2apic_phys_probe(void)
 /* Common x2apic functions, also used by x2apic_cluster */
 int x2apic_apic_id_valid(u32 apicid)
 {
+	if (x2apic_max_apicid && apicid > x2apic_max_apicid)
+		return 0;
+
 	return 1;
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v2 2/8] x86/msi: Only use high bits of MSI address for DMAR unit
  2020-10-09 10:46 ` [PATCH v2 0/8] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
  2020-10-09 10:46   ` [PATCH v2 1/8] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
@ 2020-10-09 10:46   ` David Woodhouse
  2020-10-09 10:46   ` [PATCH v2 3/8] x86/apic: Always provide irq_compose_msi_msg() method for vector domain David Woodhouse
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-09 10:46 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel, linux-hyperv

From: David Woodhouse <dwmw@amazon.co.uk>

The Intel IOMMU has an MSI-like configuration for its interrupt, but
it isn't really MSI. So it gets to abuse the high 32 bits of the address,
and puts the high 24 bits of the extended APIC ID there.

This isn't something that can be used in the general case for real MSIs,
since external devices using the high bits of the address would be
performing writes to actual memory space above 4GiB, not targeted at the
APIC.

Factor the hack out and allow it only to be used when appropriate,
adding a WARN_ON_ONCE() if other MSIs are targeted at an unreachable
APIC ID. That should never happen since the compatibility MSI messages
are not used when Interrupt Remapping is enabled.

The x2apic_enabled() check isn't needed because Linux won't bring up
CPUs with higher APIC IDs unless IR and x2apic are enabled anyway.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kernel/apic/msi.c | 33 +++++++++++++++++++++++++++------
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 6313f0a05db7..516df47bde73 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -23,13 +23,11 @@
 
 struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
 
-static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg)
+static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
+				  bool dmar)
 {
 	msg->address_hi = MSI_ADDR_BASE_HI;
 
-	if (x2apic_enabled())
-		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
-
 	msg->address_lo =
 		MSI_ADDR_BASE_LO |
 		((apic->irq_dest_mode == 0) ?
@@ -43,18 +41,29 @@ static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg)
 		MSI_DATA_LEVEL_ASSERT |
 		MSI_DATA_DELIVERY_FIXED |
 		MSI_DATA_VECTOR(cfg->vector);
+
+	/*
+	 * Only the IOMMU itself can use the trick of putting destination
+	 * APIC ID into the high bits of the address. Anything else would
+	 * just be writing to memory if it tried that, and needs IR to
+	 * address higher APIC IDs.
+	 */
+	if (dmar)
+		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
+	else
+		WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid));
 }
 
 void x86_vector_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
 {
-	__irq_msi_compose_msg(irqd_cfg(data), msg);
+	__irq_msi_compose_msg(irqd_cfg(data), msg, false);
 }
 
 static void irq_msi_update_msg(struct irq_data *irqd, struct irq_cfg *cfg)
 {
 	struct msi_msg msg[2] = { [1] = { }, };
 
-	__irq_msi_compose_msg(cfg, msg);
+	__irq_msi_compose_msg(cfg, msg, false);
 	irq_data_get_irq_chip(irqd)->irq_write_msi_msg(irqd, msg);
 }
 
@@ -276,6 +285,17 @@ struct irq_domain *arch_create_remap_msi_irq_domain(struct irq_domain *parent,
 #endif
 
 #ifdef CONFIG_DMAR_TABLE
+/*
+ * The Intel IOMMU (ab)uses the high bits of the MSI address to contain the
+ * high bits of the destination APIC ID. This can't be done in the general
+ * case for MSIs as it would be targeting real memory above 4GiB not the
+ * APIC.
+ */
+static void dmar_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
+{
+	__irq_msi_compose_msg(irqd_cfg(data), msg, true);
+}
+
 static void dmar_msi_write_msg(struct irq_data *data, struct msi_msg *msg)
 {
 	dmar_msi_write(data->irq, msg);
@@ -288,6 +308,7 @@ static struct irq_chip dmar_msi_controller = {
 	.irq_ack		= irq_chip_ack_parent,
 	.irq_set_affinity	= msi_domain_set_affinity,
 	.irq_retrigger		= irq_chip_retrigger_hierarchy,
+	.irq_compose_msi_msg	= dmar_msi_compose_msg,
 	.irq_write_msi_msg	= dmar_msi_write_msg,
 	.flags			= IRQCHIP_SKIP_SET_WAKE,
 };
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v2 3/8] x86/apic: Always provide irq_compose_msi_msg() method for vector domain
  2020-10-09 10:46 ` [PATCH v2 0/8] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
  2020-10-09 10:46   ` [PATCH v2 1/8] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
  2020-10-09 10:46   ` [PATCH v2 2/8] x86/msi: Only use high bits of MSI address for DMAR unit David Woodhouse
@ 2020-10-09 10:46   ` David Woodhouse
  2020-10-09 10:46   ` [PATCH v2 4/8] x86/ioapic: Handle Extended Destination ID field in RTE David Woodhouse
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-09 10:46 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel, linux-hyperv

From: David Woodhouse <dwmw@amazon.co.uk>

This shouldn't be dependent on PCI_MSI. HPET and I/OAPIC can deliver
interrupts through MSI without having any PCI in the system at all.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/apic.h   |  8 +++-----
 arch/x86/kernel/apic/apic.c   | 32 +++++++++++++++++++++++++++++++
 arch/x86/kernel/apic/msi.c    | 36 -----------------------------------
 arch/x86/kernel/apic/vector.c |  6 ++++++
 4 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index b0fd204e0023..f5b88fb723bf 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -521,12 +521,10 @@ static inline void apic_smt_update(void) { }
 #endif
 
 struct msi_msg;
+struct irq_cfg;
 
-#ifdef CONFIG_PCI_MSI
-void x86_vector_msi_compose_msg(struct irq_data *data, struct msi_msg *msg);
-#else
-# define x86_vector_msi_compose_msg NULL
-#endif
+extern void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
+				  bool dmar);
 
 extern void ioapic_zap_locks(void);
 
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 113f6ca7b828..fba3ba383ad2 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -50,6 +50,7 @@
 #include <asm/io_apic.h>
 #include <asm/desc.h>
 #include <asm/hpet.h>
+#include <asm/msidef.h>
 #include <asm/mtrr.h>
 #include <asm/time.h>
 #include <asm/smp.h>
@@ -2480,6 +2481,37 @@ int hard_smp_processor_id(void)
 	return read_apic_id();
 }
 
+void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
+			   bool dmar)
+{
+	msg->address_hi = MSI_ADDR_BASE_HI;
+
+	msg->address_lo =
+		MSI_ADDR_BASE_LO |
+		((apic->irq_dest_mode == 0) ?
+			MSI_ADDR_DEST_MODE_PHYSICAL :
+			MSI_ADDR_DEST_MODE_LOGICAL) |
+		MSI_ADDR_REDIRECTION_CPU |
+		MSI_ADDR_DEST_ID(cfg->dest_apicid);
+
+	msg->data =
+		MSI_DATA_TRIGGER_EDGE |
+		MSI_DATA_LEVEL_ASSERT |
+		MSI_DATA_DELIVERY_FIXED |
+		MSI_DATA_VECTOR(cfg->vector);
+
+	/*
+	 * Only the IOMMU itself can use the trick of putting destination
+	 * APIC ID into the high bits of the address. Anything else would
+	 * just be writing to memory if it tried that, and needs IR to
+	 * address higher APIC IDs.
+	 */
+	if (dmar)
+		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
+	else
+		WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid));
+}
+
 /*
  * Override the generic EOI implementation with an optimized version.
  * Only called during early boot when only one CPU is active and with
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 516df47bde73..de585cfa4d6c 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -23,42 +23,6 @@
 
 struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
 
-static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
-				  bool dmar)
-{
-	msg->address_hi = MSI_ADDR_BASE_HI;
-
-	msg->address_lo =
-		MSI_ADDR_BASE_LO |
-		((apic->irq_dest_mode == 0) ?
-			MSI_ADDR_DEST_MODE_PHYSICAL :
-			MSI_ADDR_DEST_MODE_LOGICAL) |
-		MSI_ADDR_REDIRECTION_CPU |
-		MSI_ADDR_DEST_ID(cfg->dest_apicid);
-
-	msg->data =
-		MSI_DATA_TRIGGER_EDGE |
-		MSI_DATA_LEVEL_ASSERT |
-		MSI_DATA_DELIVERY_FIXED |
-		MSI_DATA_VECTOR(cfg->vector);
-
-	/*
-	 * Only the IOMMU itself can use the trick of putting destination
-	 * APIC ID into the high bits of the address. Anything else would
-	 * just be writing to memory if it tried that, and needs IR to
-	 * address higher APIC IDs.
-	 */
-	if (dmar)
-		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
-	else
-		WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid));
-}
-
-void x86_vector_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
-{
-	__irq_msi_compose_msg(irqd_cfg(data), msg, false);
-}
-
 static void irq_msi_update_msg(struct irq_data *irqd, struct irq_cfg *cfg)
 {
 	struct msi_msg msg[2] = { [1] = { }, };
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 1eac53632786..bb2e2a2488a5 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -818,6 +818,12 @@ void apic_ack_edge(struct irq_data *irqd)
 	apic_ack_irq(irqd);
 }
 
+static void x86_vector_msi_compose_msg(struct irq_data *data,
+				       struct msi_msg *msg)
+{
+       __irq_msi_compose_msg(irqd_cfg(data), msg, false);
+}
+
 static struct irq_chip lapic_controller = {
 	.name			= "APIC",
 	.irq_ack		= apic_ack_edge,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v2 4/8] x86/ioapic: Handle Extended Destination ID field in RTE
  2020-10-09 10:46 ` [PATCH v2 0/8] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
                     ` (2 preceding siblings ...)
  2020-10-09 10:46   ` [PATCH v2 3/8] x86/apic: Always provide irq_compose_msi_msg() method for vector domain David Woodhouse
@ 2020-10-09 10:46   ` David Woodhouse
  2020-10-09 10:46   ` [PATCH v2 5/8] x86/apic: Support 15 bits of APIC ID in MSI where available David Woodhouse
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-09 10:46 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel, linux-hyperv

From: David Woodhouse <dwmw@amazon.co.uk>

Bits 63-48 of the I/OAPIC Redirection Table Entry map directly to
bits 19-4 of the address used in the resulting MSI cycle.

Historically, the x86 MSI format only used the top 8 of those 16 bits as
the destination APIC ID, and the "Extended Destination ID" in the lower
8 bits was unused.

With interrupt remapping, the lowest bit of the Extended Destination ID
(bit 48 of RTE, bit 4 of MSI address) is now used to indicate a
remappable format MSI.

A hypervisor can use the other 7 bits of the Extended Destination ID to
permit guests to address up to 15 bits of APIC IDs, thus allowing 32768
vCPUs before having to expose a vIOMMU and interrupt remapping to the
guest.

This enlightenment could theoretically be transparent to the I/OAPIC
code if it were always generating its RTE from an MSI message created by
the parent irqchip. That cleanup will happen separately but doesn't cover
all cases — for the ExtINT hackery and restoring boot mode, RTEs are
still generated locally. So we have to teach the I/OAPIC about the
ext_dest bits anyway.

No behavioural change in this patch, since nothing yet permits APIC IDs
above 255 to be used with the non-IR I/OAPIC domain.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/io_apic.h |  3 ++-
 arch/x86/kernel/apic/io_apic.c | 19 +++++++++++++------
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index a1a26f6d3aa4..5bb3cf4ff2fd 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -78,7 +78,8 @@ struct IO_APIC_route_entry {
 		mask		:  1,	/* 0: enabled, 1: disabled */
 		__reserved_2	: 15;
 
-	__u32	__reserved_3	: 24,
+	__u32	__reserved_3	: 17,
+		virt_ext_dest	:  7,
 		dest		:  8;
 } __attribute__ ((packed));
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index a33380059db6..54f6a029b1d1 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1239,10 +1239,10 @@ static void io_apic_print_entries(unsigned int apic, unsigned int nr_entries)
 			       buf, (ir_entry->index2 << 15) | ir_entry->index,
 			       ir_entry->zero);
 		else
-			printk(KERN_DEBUG "%s, %s, D(%02X), M(%1d)\n",
+			printk(KERN_DEBUG "%s, %s, D(%02X%02X), M(%1d)\n",
 			       buf,
 			       entry.dest_mode == IOAPIC_DEST_MODE_LOGICAL ?
-			       "logical " : "physical",
+			       "logical " : "physical", entry.virt_ext_dest,
 			       entry.dest, entry.delivery_mode);
 	}
 }
@@ -1410,6 +1410,7 @@ void native_restore_boot_irq_mode(void)
 	 */
 	if (ioapic_i8259.pin != -1) {
 		struct IO_APIC_route_entry entry;
+		u32 apic_id = read_apic_id();
 
 		memset(&entry, 0, sizeof(entry));
 		entry.mask		= IOAPIC_UNMASKED;
@@ -1417,7 +1418,8 @@ void native_restore_boot_irq_mode(void)
 		entry.polarity		= IOAPIC_POL_HIGH;
 		entry.dest_mode		= IOAPIC_DEST_MODE_PHYSICAL;
 		entry.delivery_mode	= dest_ExtINT;
-		entry.dest		= read_apic_id();
+		entry.dest		= apic_id & 0xff;
+		entry.virt_ext_dest	= apic_id >> 8;
 
 		/*
 		 * Add it to the IO-APIC irq-routing table:
@@ -1861,7 +1863,8 @@ static void ioapic_configure_entry(struct irq_data *irqd)
 	 * ioapic chip to verify that.
 	 */
 	if (irqd->chip == &ioapic_chip) {
-		mpd->entry.dest = cfg->dest_apicid;
+		mpd->entry.dest = cfg->dest_apicid & 0xff;
+		mpd->entry.virt_ext_dest = cfg->dest_apicid >> 8;
 		mpd->entry.vector = cfg->vector;
 	}
 	for_each_irq_pin(entry, mpd->irq_2_pin)
@@ -2027,6 +2030,7 @@ static inline void __init unlock_ExtINT_logic(void)
 	int apic, pin, i;
 	struct IO_APIC_route_entry entry0, entry1;
 	unsigned char save_control, save_freq_select;
+	u32 apic_id;
 
 	pin  = find_isa_irq_pin(8, mp_INT);
 	if (pin == -1) {
@@ -2042,11 +2046,13 @@ static inline void __init unlock_ExtINT_logic(void)
 	entry0 = ioapic_read_entry(apic, pin);
 	clear_IO_APIC_pin(apic, pin);
 
+	apic_id = hard_smp_processor_id();
 	memset(&entry1, 0, sizeof(entry1));
 
 	entry1.dest_mode = IOAPIC_DEST_MODE_PHYSICAL;
 	entry1.mask = IOAPIC_UNMASKED;
-	entry1.dest = hard_smp_processor_id();
+	entry1.dest = apic_id & 0xff;
+	entry1.virt_ext_dest = apic_id >> 8;
 	entry1.delivery_mode = dest_ExtINT;
 	entry1.polarity = entry0.polarity;
 	entry1.trigger = IOAPIC_EDGE;
@@ -2949,7 +2955,8 @@ static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
 	memset(entry, 0, sizeof(*entry));
 	entry->delivery_mode = apic->irq_delivery_mode;
 	entry->dest_mode     = apic->irq_dest_mode;
-	entry->dest	     = cfg->dest_apicid;
+	entry->dest	     = cfg->dest_apicid & 0xff;
+	entry->virt_ext_dest = cfg->dest_apicid >> 8;
 	entry->vector	     = cfg->vector;
 	entry->trigger	     = data->trigger;
 	entry->polarity	     = data->polarity;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v2 5/8] x86/apic: Support 15 bits of APIC ID in MSI where available
  2020-10-09 10:46 ` [PATCH v2 0/8] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
                     ` (3 preceding siblings ...)
  2020-10-09 10:46   ` [PATCH v2 4/8] x86/ioapic: Handle Extended Destination ID field in RTE David Woodhouse
@ 2020-10-09 10:46   ` David Woodhouse
  2020-10-09 10:46   ` [PATCH v2 6/8] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID David Woodhouse
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-09 10:46 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel, linux-hyperv

From: David Woodhouse <dwmw@amazon.co.uk>

Some hypervisors can allow the guest to use the Extended Destination ID
field in the MSI address to address up to 32768 CPUs.

This applies to all downstream devices which generate MSI cycles,
including HPET, I/OAPIC and PCI MSI.

HPET and PCI MSI use the same __irq_msi_compose_msg() function, while
I/OAPIC generates its own and had support for the extended bits added in
a previous commit.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/x86_init.h |  2 ++
 arch/x86/kernel/apic/apic.c     | 26 ++++++++++++++++++++++++--
 arch/x86/kernel/x86_init.c      |  1 +
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 397196fae24d..96a719fb2e53 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -114,6 +114,7 @@ struct x86_init_pci {
  * @init_platform:		platform setup
  * @guest_late_init:		guest late init
  * @x2apic_available:		X2APIC detection
+ * @msi_ext_dest_id:		MSI supports 15-bit APIC IDs
  * @init_mem_mapping:		setup early mappings during init_mem_mapping()
  * @init_after_bootmem:		guest init after boot allocator is finished
  */
@@ -121,6 +122,7 @@ struct x86_hyper_init {
 	void (*init_platform)(void);
 	void (*guest_late_init)(void);
 	bool (*x2apic_available)(void);
+	bool (*msi_ext_dest_id)(void);
 	void (*init_mem_mapping)(void);
 	void (*init_after_bootmem)(void);
 };
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index fba3ba383ad2..62e769c267c3 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -94,6 +94,11 @@ static unsigned int disabled_cpu_apicid __ro_after_init = BAD_APICID;
  */
 static int apic_extnmi __ro_after_init = APIC_EXTNMI_BSP;
 
+/*
+ * Hypervisor supports 15 bits of APIC ID in MSI Extended Destination ID
+ */
+static bool virt_ext_dest_id __ro_after_init;
+
 /*
  * Map cpu index to physical APIC ID
  */
@@ -1842,6 +1847,8 @@ static __init void try_to_enable_x2apic(int remap_mode)
 		return;
 
 	if (remap_mode != IRQ_REMAP_X2APIC_MODE) {
+		u32 apic_limit = 255;
+
 		/*
 		 * Using X2APIC without IR is not architecturally supported
 		 * on bare metal but may be supported in guests.
@@ -1852,12 +1859,22 @@ static __init void try_to_enable_x2apic(int remap_mode)
 			return;
 		}
 
+		/*
+		 * If the hypervisor supports extended destination ID in
+		 * MSI, that increases the maximum APIC ID that can be
+		 * used for non-remapped IRQ domains.
+		 */
+		if (x86_init.hyper.msi_ext_dest_id()) {
+			virt_ext_dest_id = 1;
+			apic_limit = 32767;
+		}
+
 		/*
 		 * Without IR, all CPUs can be addressed by IOAPIC/MSI only
 		 * in physical mode, and CPUs with an APIC ID that cannnot
 		 * be addressed must not be brought online.
 		 */
-		x2apic_set_max_apicid(255);
+		x2apic_set_max_apicid(apic_limit);
 		x2apic_phys = 1;
 	}
 	x2apic_enable();
@@ -2504,10 +2521,15 @@ void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
 	 * Only the IOMMU itself can use the trick of putting destination
 	 * APIC ID into the high bits of the address. Anything else would
 	 * just be writing to memory if it tried that, and needs IR to
-	 * address higher APIC IDs.
+	 * address APICs which can't be addressed in the normal 32-bit
+	 * address range at 0xFFExxxxx. That is typically just 8 bits, but
+	 * some hypervisors allow the extended destination ID field in bits
+	 * 11-5 to be used, giving support for 15 bits of APIC IDs in total.
 	 */
 	if (dmar)
 		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
+	else if (virt_ext_dest_id && cfg->dest_apicid < 0x8000)
+		msg->address_lo |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid) >> 3;
 	else
 		WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid));
 }
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index a3038d8deb6a..8b395821cb8d 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -110,6 +110,7 @@ struct x86_init_ops x86_init __initdata = {
 		.init_platform		= x86_init_noop,
 		.guest_late_init	= x86_init_noop,
 		.x2apic_available	= bool_x86_init_noop,
+		.msi_ext_dest_id	= bool_x86_init_noop,
 		.init_mem_mapping	= x86_init_noop,
 		.init_after_bootmem	= x86_init_noop,
 	},
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v2 6/8] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-09 10:46 ` [PATCH v2 0/8] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
                     ` (4 preceding siblings ...)
  2020-10-09 10:46   ` [PATCH v2 5/8] x86/apic: Support 15 bits of APIC ID in MSI where available David Woodhouse
@ 2020-10-09 10:46   ` David Woodhouse
  2020-10-09 10:46   ` [PATCH v2 7/8] x86/hpet: Move MSI support into hpet.c David Woodhouse
  2020-10-09 10:46   ` [PATCH v2 8/8] x86/ioapic: Generate RTE directly from parent irqchip's MSI message David Woodhouse
  7 siblings, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-09 10:46 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel, linux-hyperv

From: David Woodhouse <dwmw@amazon.co.uk>

This allows the host to indicate that MSI emulation supports 15-bit
destination IDs, allowing up to 32768 CPUs without interrupt remapping.

cf. https://patchwork.kernel.org/patch/11816693/ for qemu

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/virt/kvm/cpuid.rst     | 4 ++++
 arch/x86/include/uapi/asm/kvm_para.h | 1 +
 arch/x86/kernel/kvm.c                | 6 ++++++
 3 files changed, 11 insertions(+)

diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index a7dff9186bed..1726b5925d2b 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -92,6 +92,10 @@ KVM_FEATURE_ASYNC_PF_INT          14          guest checks this feature bit
                                               async pf acknowledgment msr
                                               0x4b564d07.
 
+KVM_FEATURE_MSI_EXT_DEST_ID       15          guest checks this feature bit
+                                              before using extended destination
+                                              ID bits in MSI address bits 11-5.
+
 KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24          host will warn if no guest-side
                                               per-cpu warps are expeced in
                                               kvmclock
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 812e9b4c1114..950afebfba88 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -32,6 +32,7 @@
 #define KVM_FEATURE_POLL_CONTROL	12
 #define KVM_FEATURE_PV_SCHED_YIELD	13
 #define KVM_FEATURE_ASYNC_PF_INT	14
+#define KVM_FEATURE_MSI_EXT_DEST_ID	15
 
 #define KVM_HINTS_REALTIME      0
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 1b51b727b140..4986b4399aef 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -743,12 +743,18 @@ static void __init kvm_init_platform(void)
 	x86_platform.apic_post_init = kvm_apic_init;
 }
 
+static bool __init kvm_msi_ext_dest_id(void)
+{
+	return kvm_para_has_feature(KVM_FEATURE_MSI_EXT_DEST_ID);
+}
+
 const __initconst struct hypervisor_x86 x86_hyper_kvm = {
 	.name			= "KVM",
 	.detect			= kvm_detect,
 	.type			= X86_HYPER_KVM,
 	.init.guest_late_init	= kvm_guest_init,
 	.init.x2apic_available	= kvm_para_available,
+	.init.msi_ext_dest_id	= kvm_msi_ext_dest_id,
 	.init.init_platform	= kvm_init_platform,
 };
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v2 7/8] x86/hpet: Move MSI support into hpet.c
  2020-10-09 10:46 ` [PATCH v2 0/8] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
                     ` (5 preceding siblings ...)
  2020-10-09 10:46   ` [PATCH v2 6/8] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID David Woodhouse
@ 2020-10-09 10:46   ` David Woodhouse
  2020-10-09 10:46   ` [PATCH v2 8/8] x86/ioapic: Generate RTE directly from parent irqchip's MSI message David Woodhouse
  7 siblings, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-09 10:46 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel, linux-hyperv

From: David Woodhouse <dwmw@amazon.co.uk>

This isn't really dependent on PCI MSI; it's just generic MSI which is
now supported by the generic x86_vector_domain. Move the HPET MSI
support back into hpet.c with the rest of the HPET support.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/hpet.h |  11 ----
 arch/x86/kernel/apic/msi.c  | 111 ----------------------------------
 arch/x86/kernel/hpet.c      | 116 ++++++++++++++++++++++++++++++++++--
 3 files changed, 111 insertions(+), 127 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 6352dee37cda..ab9f3dd87c80 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -74,17 +74,6 @@ extern void hpet_disable(void);
 extern unsigned int hpet_readl(unsigned int a);
 extern void force_hpet_resume(void);
 
-struct irq_data;
-struct hpet_channel;
-struct irq_domain;
-
-extern void hpet_msi_unmask(struct irq_data *data);
-extern void hpet_msi_mask(struct irq_data *data);
-extern void hpet_msi_write(struct hpet_channel *hc, struct msi_msg *msg);
-extern struct irq_domain *hpet_create_irq_domain(int hpet_id);
-extern int hpet_assign_irq(struct irq_domain *domain,
-			   struct hpet_channel *hc, int dev_num);
-
 #ifdef CONFIG_HPET_EMULATE_RTC
 
 #include <linux/interrupt.h>
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index de585cfa4d6c..74190f83c034 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -341,114 +341,3 @@ void dmar_free_hwirq(int irq)
 	irq_domain_free_irqs(irq, 1);
 }
 #endif
-
-/*
- * MSI message composition
- */
-#ifdef CONFIG_HPET_TIMER
-static inline int hpet_dev_id(struct irq_domain *domain)
-{
-	struct msi_domain_info *info = msi_get_domain_info(domain);
-
-	return (int)(long)info->data;
-}
-
-static void hpet_msi_write_msg(struct irq_data *data, struct msi_msg *msg)
-{
-	hpet_msi_write(irq_data_get_irq_handler_data(data), msg);
-}
-
-static struct irq_chip hpet_msi_controller __ro_after_init = {
-	.name = "HPET-MSI",
-	.irq_unmask = hpet_msi_unmask,
-	.irq_mask = hpet_msi_mask,
-	.irq_ack = irq_chip_ack_parent,
-	.irq_set_affinity = msi_domain_set_affinity,
-	.irq_retrigger = irq_chip_retrigger_hierarchy,
-	.irq_write_msi_msg = hpet_msi_write_msg,
-	.flags = IRQCHIP_SKIP_SET_WAKE,
-};
-
-static int hpet_msi_init(struct irq_domain *domain,
-			 struct msi_domain_info *info, unsigned int virq,
-			 irq_hw_number_t hwirq, msi_alloc_info_t *arg)
-{
-	irq_set_status_flags(virq, IRQ_MOVE_PCNTXT);
-	irq_domain_set_info(domain, virq, arg->hwirq, info->chip, NULL,
-			    handle_edge_irq, arg->data, "edge");
-
-	return 0;
-}
-
-static void hpet_msi_free(struct irq_domain *domain,
-			  struct msi_domain_info *info, unsigned int virq)
-{
-	irq_clear_status_flags(virq, IRQ_MOVE_PCNTXT);
-}
-
-static struct msi_domain_ops hpet_msi_domain_ops = {
-	.msi_init	= hpet_msi_init,
-	.msi_free	= hpet_msi_free,
-};
-
-static struct msi_domain_info hpet_msi_domain_info = {
-	.ops		= &hpet_msi_domain_ops,
-	.chip		= &hpet_msi_controller,
-	.flags		= MSI_FLAG_USE_DEF_DOM_OPS,
-};
-
-struct irq_domain *hpet_create_irq_domain(int hpet_id)
-{
-	struct msi_domain_info *domain_info;
-	struct irq_domain *parent, *d;
-	struct irq_alloc_info info;
-	struct fwnode_handle *fn;
-
-	if (x86_vector_domain == NULL)
-		return NULL;
-
-	domain_info = kzalloc(sizeof(*domain_info), GFP_KERNEL);
-	if (!domain_info)
-		return NULL;
-
-	*domain_info = hpet_msi_domain_info;
-	domain_info->data = (void *)(long)hpet_id;
-
-	init_irq_alloc_info(&info, NULL);
-	info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
-	info.devid = hpet_id;
-	parent = irq_remapping_get_irq_domain(&info);
-	if (parent == NULL)
-		parent = x86_vector_domain;
-	else
-		hpet_msi_controller.name = "IR-HPET-MSI";
-
-	fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
-					      hpet_id);
-	if (!fn) {
-		kfree(domain_info);
-		return NULL;
-	}
-
-	d = msi_create_irq_domain(fn, domain_info, parent);
-	if (!d) {
-		irq_domain_free_fwnode(fn);
-		kfree(domain_info);
-	}
-	return d;
-}
-
-int hpet_assign_irq(struct irq_domain *domain, struct hpet_channel *hc,
-		    int dev_num)
-{
-	struct irq_alloc_info info;
-
-	init_irq_alloc_info(&info, NULL);
-	info.type = X86_IRQ_ALLOC_TYPE_HPET;
-	info.data = hc;
-	info.devid = hpet_dev_id(domain);
-	info.hwirq = dev_num;
-
-	return irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, &info);
-}
-#endif
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 7a50f0b62a70..11fd2676fb1d 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -7,6 +7,7 @@
 #include <linux/cpu.h>
 #include <linux/irq.h>
 
+#include <asm/irq_remapping.h>
 #include <asm/hpet.h>
 #include <asm/time.h>
 
@@ -467,9 +468,8 @@ static void __init hpet_legacy_clockevent_register(struct hpet_channel *hc)
 /*
  * HPET MSI Support
  */
-#ifdef CONFIG_PCI_MSI
-
-void hpet_msi_unmask(struct irq_data *data)
+#ifdef CONFIG_GENERIC_MSI_IRQ
+static void hpet_msi_unmask(struct irq_data *data)
 {
 	struct hpet_channel *hc = irq_data_get_irq_handler_data(data);
 	unsigned int cfg;
@@ -479,7 +479,7 @@ void hpet_msi_unmask(struct irq_data *data)
 	hpet_writel(cfg, HPET_Tn_CFG(hc->num));
 }
 
-void hpet_msi_mask(struct irq_data *data)
+static void hpet_msi_mask(struct irq_data *data)
 {
 	struct hpet_channel *hc = irq_data_get_irq_handler_data(data);
 	unsigned int cfg;
@@ -489,12 +489,118 @@ void hpet_msi_mask(struct irq_data *data)
 	hpet_writel(cfg, HPET_Tn_CFG(hc->num));
 }
 
-void hpet_msi_write(struct hpet_channel *hc, struct msi_msg *msg)
+static void hpet_msi_write(struct hpet_channel *hc, struct msi_msg *msg)
 {
 	hpet_writel(msg->data, HPET_Tn_ROUTE(hc->num));
 	hpet_writel(msg->address_lo, HPET_Tn_ROUTE(hc->num) + 4);
 }
 
+static void hpet_msi_write_msg(struct irq_data *data, struct msi_msg *msg)
+{
+	hpet_msi_write(irq_data_get_irq_handler_data(data), msg);
+}
+
+static struct irq_chip hpet_msi_controller __ro_after_init = {
+	.name = "HPET-MSI",
+	.irq_unmask = hpet_msi_unmask,
+	.irq_mask = hpet_msi_mask,
+	.irq_ack = irq_chip_ack_parent,
+	.irq_set_affinity = msi_domain_set_affinity,
+	.irq_retrigger = irq_chip_retrigger_hierarchy,
+	.irq_write_msi_msg = hpet_msi_write_msg,
+	.flags = IRQCHIP_SKIP_SET_WAKE,
+};
+
+static int hpet_msi_init(struct irq_domain *domain,
+			 struct msi_domain_info *info, unsigned int virq,
+			 irq_hw_number_t hwirq, msi_alloc_info_t *arg)
+{
+	irq_set_status_flags(virq, IRQ_MOVE_PCNTXT);
+	irq_domain_set_info(domain, virq, arg->hwirq, info->chip, NULL,
+			    handle_edge_irq, arg->data, "edge");
+
+	return 0;
+}
+
+static void hpet_msi_free(struct irq_domain *domain,
+			  struct msi_domain_info *info, unsigned int virq)
+{
+	irq_clear_status_flags(virq, IRQ_MOVE_PCNTXT);
+}
+
+static struct msi_domain_ops hpet_msi_domain_ops = {
+	.msi_init	= hpet_msi_init,
+	.msi_free	= hpet_msi_free,
+};
+
+static struct msi_domain_info hpet_msi_domain_info = {
+	.ops		= &hpet_msi_domain_ops,
+	.chip		= &hpet_msi_controller,
+	.flags		= MSI_FLAG_USE_DEF_DOM_OPS,
+};
+
+static struct irq_domain *hpet_create_irq_domain(int hpet_id)
+{
+	struct msi_domain_info *domain_info;
+	struct irq_domain *parent, *d;
+	struct irq_alloc_info info;
+	struct fwnode_handle *fn;
+
+	if (x86_vector_domain == NULL)
+		return NULL;
+
+	domain_info = kzalloc(sizeof(*domain_info), GFP_KERNEL);
+	if (!domain_info)
+		return NULL;
+
+	*domain_info = hpet_msi_domain_info;
+	domain_info->data = (void *)(long)hpet_id;
+
+	init_irq_alloc_info(&info, NULL);
+	info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
+	info.devid = hpet_id;
+	parent = irq_remapping_get_irq_domain(&info);
+	if (parent == NULL)
+		parent = x86_vector_domain;
+	else
+		hpet_msi_controller.name = "IR-HPET-MSI";
+
+	fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
+					      hpet_id);
+	if (!fn) {
+		kfree(domain_info);
+		return NULL;
+	}
+
+	d = msi_create_irq_domain(fn, domain_info, parent);
+	if (!d) {
+		irq_domain_free_fwnode(fn);
+		kfree(domain_info);
+	}
+	return d;
+}
+
+static inline int hpet_dev_id(struct irq_domain *domain)
+{
+	struct msi_domain_info *info = msi_get_domain_info(domain);
+
+	return (int)(long)info->data;
+}
+
+static int hpet_assign_irq(struct irq_domain *domain, struct hpet_channel *hc,
+			   int dev_num)
+{
+	struct irq_alloc_info info;
+
+	init_irq_alloc_info(&info, NULL);
+	info.type = X86_IRQ_ALLOC_TYPE_HPET;
+	info.data = hc;
+	info.devid = hpet_dev_id(domain);
+	info.hwirq = dev_num;
+
+	return irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, &info);
+}
+
 static int hpet_clkevt_msi_resume(struct clock_event_device *evt)
 {
 	struct hpet_channel *hc = clockevent_to_channel(evt);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v2 8/8] x86/ioapic: Generate RTE directly from parent irqchip's MSI message
  2020-10-09 10:46 ` [PATCH v2 0/8] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
                     ` (6 preceding siblings ...)
  2020-10-09 10:46   ` [PATCH v2 7/8] x86/hpet: Move MSI support into hpet.c David Woodhouse
@ 2020-10-09 10:46   ` David Woodhouse
  2020-10-22 21:43     ` Thomas Gleixner
  7 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-09 10:46 UTC (permalink / raw)
  To: x86; +Cc: kvm, Thomas Gleixner, Paolo Bonzini, linux-kernel, linux-hyperv

From: David Woodhouse <dwmw@amazon.co.uk>

The I/OAPIC generates an MSI cycle with address/data bits taken from its
Redirection Table Entry in some combination which used to make sense,
but now is just a bunch of bits which get passed through in some
seemingly arbitrary order.

Instead of making IRQ remapping drivers directly frob the I/OAPIC RTE,
let them just do their job and generate an MSI message. The bit
swizzling to turn that MSI message into the IOAPIC's RTE is the same in
all cases, since it's a function of the I/OAPIC hardware. The IRQ
remappers have no real need to get involved with that.

The only slight caveat is that the I/OAPIC is interpreting some of
those fields too, and it does want the 'vector' field to be unique
to make EOI work. The AMD IOMMU happens to put its IRTE index in the
bits that the I/OAPIC thinks are the vector field, and accommodates
this requirement by reserving the first 32 indices for the I/OAPIC.
The Intel IOMMU doesn't actually use the bits that the I/OAPIC thinks
are the vector field, so it fills in the 'pin' value there instead.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/hw_irq.h       | 11 +++---
 arch/x86/include/asm/msidef.h       |  2 ++
 arch/x86/kernel/apic/io_apic.c      | 55 ++++++++++++++++++-----------
 drivers/iommu/amd/iommu.c           | 14 --------
 drivers/iommu/hyperv-iommu.c        | 31 ----------------
 drivers/iommu/intel/irq_remapping.c | 19 +++-------
 6 files changed, 46 insertions(+), 86 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index a4aeeaace040..aabd8f1b6bb0 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -45,12 +45,11 @@ enum irq_alloc_type {
 };
 
 struct ioapic_alloc_info {
-	int				pin;
-	int				node;
-	u32				trigger : 1;
-	u32				polarity : 1;
-	u32				valid : 1;
-	struct IO_APIC_route_entry	*entry;
+	int		pin;
+	int		node;
+	u32		trigger : 1;
+	u32		polarity : 1;
+	u32		valid : 1;
 };
 
 struct uv_alloc_info {
diff --git a/arch/x86/include/asm/msidef.h b/arch/x86/include/asm/msidef.h
index ee2f8ccc32d0..37c3d2d492c9 100644
--- a/arch/x86/include/asm/msidef.h
+++ b/arch/x86/include/asm/msidef.h
@@ -18,6 +18,7 @@
 #define MSI_DATA_DELIVERY_MODE_SHIFT	8
 #define  MSI_DATA_DELIVERY_FIXED	(0 << MSI_DATA_DELIVERY_MODE_SHIFT)
 #define  MSI_DATA_DELIVERY_LOWPRI	(1 << MSI_DATA_DELIVERY_MODE_SHIFT)
+#define  MSI_DATA_DELIVERY_MODE_MASK	(3 << MSI_DATA_DELIVERY_MODE_SHIFT)
 
 #define MSI_DATA_LEVEL_SHIFT		14
 #define	 MSI_DATA_LEVEL_DEASSERT	(0 << MSI_DATA_LEVEL_SHIFT)
@@ -37,6 +38,7 @@
 #define MSI_ADDR_DEST_MODE_SHIFT	2
 #define  MSI_ADDR_DEST_MODE_PHYSICAL	(0 << MSI_ADDR_DEST_MODE_SHIFT)
 #define	 MSI_ADDR_DEST_MODE_LOGICAL	(1 << MSI_ADDR_DEST_MODE_SHIFT)
+#define  MSI_ADDR_DEST_MODE_MASK	(1 << MSI_DATA_DELIVERY_MODE_SHIFT)
 
 #define MSI_ADDR_REDIRECTION_SHIFT	3
 #define  MSI_ADDR_REDIRECTION_CPU	(0 << MSI_ADDR_REDIRECTION_SHIFT)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 54f6a029b1d1..ca2da19d5c55 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -48,6 +48,7 @@
 #include <linux/jiffies.h>	/* time_after() */
 #include <linux/slab.h>
 #include <linux/memblock.h>
+#include <linux/msi.h>
 
 #include <asm/irqdomain.h>
 #include <asm/io.h>
@@ -63,6 +64,7 @@
 #include <asm/setup.h>
 #include <asm/irq_remapping.h>
 #include <asm/hw_irq.h>
+#include <asm/msidef.h>
 
 #include <asm/apic.h>
 
@@ -1851,22 +1853,36 @@ static void ioapic_ir_ack_level(struct irq_data *irq_data)
 	eoi_ioapic_pin(data->entry.vector, data);
 }
 
+static void mp_swizzle_msi_dest_bits(struct irq_data *irq_data, void *_entry)
+{
+	struct msi_msg msg;
+	u32 *entry = _entry;
+
+	irq_chip_compose_msi_msg(irq_data, &msg);
+
+	/*
+	 * They're in a bit of a random order for historical reasons, but
+	 * the IO/APIC is just a device for turning interrupt lines into
+	 * MSIs, and various bits of the MSI addr/data are just swizzled
+	 * into/from the bits of Redirection Table Entry.
+	 */
+	entry[0] &= 0xfffff000;
+	entry[0] |= (msg.data & (MSI_DATA_DELIVERY_MODE_MASK |
+				 MSI_DATA_VECTOR_MASK));
+	entry[0] |= (msg.address_lo & MSI_ADDR_DEST_MODE_MASK) << 9;
+
+	entry[1] &= 0xffff;
+	entry[1] |= (msg.address_lo & MSI_ADDR_DEST_ID_MASK) << 12;
+}
+
+
 static void ioapic_configure_entry(struct irq_data *irqd)
 {
 	struct mp_chip_data *mpd = irqd->chip_data;
-	struct irq_cfg *cfg = irqd_cfg(irqd);
 	struct irq_pin_list *entry;
 
-	/*
-	 * Only update when the parent is the vector domain, don't touch it
-	 * if the parent is the remapping domain. Check the installed
-	 * ioapic chip to verify that.
-	 */
-	if (irqd->chip == &ioapic_chip) {
-		mpd->entry.dest = cfg->dest_apicid & 0xff;
-		mpd->entry.virt_ext_dest = cfg->dest_apicid >> 8;
-		mpd->entry.vector = cfg->vector;
-	}
+	mp_swizzle_msi_dest_bits(irqd, &mpd->entry);
+
 	for_each_irq_pin(entry, mpd->irq_2_pin)
 		__ioapic_write_entry(entry->apic, entry->pin, mpd->entry);
 }
@@ -2949,15 +2965,14 @@ static void mp_irqdomain_get_attr(u32 gsi, struct mp_chip_data *data,
 	}
 }
 
-static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
-			   struct IO_APIC_route_entry *entry)
+static void mp_setup_entry(struct irq_data *irq_data, struct mp_chip_data *data)
 {
+	struct IO_APIC_route_entry *entry = &data->entry;
+
 	memset(entry, 0, sizeof(*entry));
-	entry->delivery_mode = apic->irq_delivery_mode;
-	entry->dest_mode     = apic->irq_dest_mode;
-	entry->dest	     = cfg->dest_apicid & 0xff;
-	entry->virt_ext_dest = cfg->dest_apicid >> 8;
-	entry->vector	     = cfg->vector;
+
+	mp_swizzle_msi_dest_bits(irq_data, entry);
+
 	entry->trigger	     = data->trigger;
 	entry->polarity	     = data->polarity;
 	/*
@@ -2995,7 +3010,6 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
 	if (!data)
 		return -ENOMEM;
 
-	info->ioapic.entry = &data->entry;
 	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, info);
 	if (ret < 0) {
 		kfree(data);
@@ -3013,8 +3027,7 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
 	add_pin_to_irq_node(data, ioapic_alloc_attr_node(info), ioapic, pin);
 
 	local_irq_save(flags);
-	if (info->ioapic.entry)
-		mp_setup_entry(cfg, data, info->ioapic.entry);
+	mp_setup_entry(irq_data, data);
 	mp_register_handler(virq, data->trigger);
 	if (virq < nr_legacy_irqs())
 		legacy_pic->mask(virq);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index ef64e01f66d7..13d0a8f42d56 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3597,7 +3597,6 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 {
 	struct irq_2_irte *irte_info = &data->irq_2_irte;
 	struct msi_msg *msg = &data->msi_entry;
-	struct IO_APIC_route_entry *entry;
 	struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
 
 	if (!iommu)
@@ -3611,19 +3610,6 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC:
-		/* Setup IOAPIC entry */
-		entry = info->ioapic.entry;
-		info->ioapic.entry = NULL;
-		memset(entry, 0, sizeof(*entry));
-		entry->vector        = index;
-		entry->mask          = 0;
-		entry->trigger       = info->ioapic.trigger;
-		entry->polarity      = info->ioapic.polarity;
-		/* Mask level triggered irqs. */
-		if (info->ioapic.trigger)
-			entry->mask = 1;
-		break;
-
 	case X86_IRQ_ALLOC_TYPE_HPET:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e09e2d734c57..37dd485a5640 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -40,7 +40,6 @@ static int hyperv_ir_set_affinity(struct irq_data *data,
 {
 	struct irq_data *parent = data->parent_data;
 	struct irq_cfg *cfg = irqd_cfg(data);
-	struct IO_APIC_route_entry *entry;
 	int ret;
 
 	/* Return error If new irq affinity is out of ioapic_max_cpumask. */
@@ -51,9 +50,6 @@ static int hyperv_ir_set_affinity(struct irq_data *data,
 	if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
 		return ret;
 
-	entry = data->chip_data;
-	entry->dest = cfg->dest_apicid;
-	entry->vector = cfg->vector;
 	send_cleanup_vector(cfg);
 
 	return 0;
@@ -89,20 +85,6 @@ static int hyperv_irq_remapping_alloc(struct irq_domain *domain,
 
 	irq_data->chip = &hyperv_ir_chip;
 
-	/*
-	 * If there is interrupt remapping function of IOMMU, setting irq
-	 * affinity only needs to change IRTE of IOMMU. But Hyper-V doesn't
-	 * support interrupt remapping function, setting irq affinity of IO-APIC
-	 * interrupts still needs to change IO-APIC registers. But ioapic_
-	 * configure_entry() will ignore value of cfg->vector and cfg->
-	 * dest_apicid when IO-APIC's parent irq domain is not the vector
-	 * domain.(See ioapic_configure_entry()) In order to setting vector
-	 * and dest_apicid to IO-APIC register, IO-APIC entry pointer is saved
-	 * in the chip_data and hyperv_irq_remapping_activate()/hyperv_ir_set_
-	 * affinity() set vector and dest_apicid directly into IO-APIC entry.
-	 */
-	irq_data->chip_data = info->ioapic.entry;
-
 	/*
 	 * Hypver-V IO APIC irq affinity should be in the scope of
 	 * ioapic_max_cpumask because no irq remapping support.
@@ -119,22 +101,9 @@ static void hyperv_irq_remapping_free(struct irq_domain *domain,
 	irq_domain_free_irqs_common(domain, virq, nr_irqs);
 }
 
-static int hyperv_irq_remapping_activate(struct irq_domain *domain,
-			  struct irq_data *irq_data, bool reserve)
-{
-	struct irq_cfg *cfg = irqd_cfg(irq_data);
-	struct IO_APIC_route_entry *entry = irq_data->chip_data;
-
-	entry->dest = cfg->dest_apicid;
-	entry->vector = cfg->vector;
-
-	return 0;
-}
-
 static const struct irq_domain_ops hyperv_ir_domain_ops = {
 	.alloc = hyperv_irq_remapping_alloc,
 	.free = hyperv_irq_remapping_free,
-	.activate = hyperv_irq_remapping_activate,
 };
 
 static int __init hyperv_prepare_irq_remapping(void)
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 0cfce1d3b7bb..511dfb4884bc 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1265,7 +1265,6 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 					     struct irq_alloc_info *info,
 					     int index, int sub_handle)
 {
-	struct IR_IO_APIC_route_entry *entry;
 	struct irte *irte = &data->irte_entry;
 	struct msi_msg *msg = &data->msi_entry;
 
@@ -1281,23 +1280,15 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 			irte->avail, irte->vector, irte->dest_id,
 			irte->sid, irte->sq, irte->svt);
 
-		entry = (struct IR_IO_APIC_route_entry *)info->ioapic.entry;
-		info->ioapic.entry = NULL;
-		memset(entry, 0, sizeof(*entry));
-		entry->index2	= (index >> 15) & 0x1;
-		entry->zero	= 0;
-		entry->format	= 1;
-		entry->index	= (index & 0x7fff);
 		/*
 		 * IO-APIC RTE will be configured with virtual vector.
 		 * irq handler will do the explicit EOI to the io-apic.
 		 */
-		entry->vector	= info->ioapic.pin;
-		entry->mask	= 0;			/* enable IRQ */
-		entry->trigger	= info->ioapic.trigger;
-		entry->polarity	= info->ioapic.polarity;
-		if (info->ioapic.trigger)
-			entry->mask = 1; /* Mask level triggered irqs. */
+		msg->data = info->ioapic.pin;
+		msg->address_hi = MSI_ADDR_BASE_HI;
+		msg->address_lo = MSI_ADDR_BASE_LO | MSI_ADDR_IR_EXT_INT |
+				  MSI_ADDR_IR_INDEX1(index) |
+				  MSI_ADDR_IR_INDEX2(index);
 		break;
 
 	case X86_IRQ_ALLOC_TYPE_HPET:
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-08 23:27               ` Thomas Gleixner
  2020-10-09  6:07                 ` David Woodhouse
@ 2020-10-10 10:06                 ` David Woodhouse
  2020-10-10 11:44                   ` Thomas Gleixner
  1 sibling, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-10 10:06 UTC (permalink / raw)
  To: Thomas Gleixner, x86; +Cc: kvm, Paolo Bonzini, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1688 bytes --]

On Fri, 2020-10-09 at 01:27 +0200, Thomas Gleixner wrote:
> On Thu, Oct 08 2020 at 22:39, David Woodhouse wrote:
> > On Thu, 2020-10-08 at 23:14 +0200, Thomas Gleixner wrote:
> > > > 
> > > > (We'd want the x86_vector_domain to actually have an MSI compose
> > > > function in the !CONFIG_PCI_MSI case if we did this, of course.)
> > > 
> > > The compose function and the vector domain wrapper can simply move to
> > > vector.c
> > 
> > I ended up putting __irq_msi_compose_msg() into apic.c and that way I
> > can make virt_ext_dest_id static in that file.
> > 
> > And then I can move all the HPET-MSI support into hpet.c too.
> 
> Works for me.
> 
> > https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/ext_dest_id
> 
> For the next submission, can you please
> 
>  - pick up the -ENODEV changes for HPET/IOAPIC which I posted earlier

I think the world will be a nicer place if HPET and IOAPIC have their
own struct device and their drivers can just use dev_get_msi_domain().

The IRQ remapping drivers already plug into the device-add notifier and
can fill in the appropriate MSI domain just like they do¹ for PCI and
ACPI devices.

Using platform_add_bundle() for HPET looks trivial enough; I'll have a
play with that and then do IOAPIC too if/when the initialisation order
and hotplug handling all works out OK to install the correct
msi_domain.

-- 
dwmw2

¹ Yeah, I know they don't do it directly; it's done in 
  pcibios_add_device(). But maybe they should? Because yeah, I also 
  know they don't do it for ACPI devices either, because nothing does,
  and IRQ remapping on ANDD-listed devices doesn't work AFAICT.


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-10 10:06                 ` David Woodhouse
@ 2020-10-10 11:44                   ` Thomas Gleixner
  2020-10-10 11:58                     ` David Woodhouse
  0 siblings, 1 reply; 192+ messages in thread
From: Thomas Gleixner @ 2020-10-10 11:44 UTC (permalink / raw)
  To: David Woodhouse, x86; +Cc: kvm, Paolo Bonzini, linux-kernel

On Sat, Oct 10 2020 at 11:06, David Woodhouse wrote:
> On Fri, 2020-10-09 at 01:27 +0200, Thomas Gleixner wrote:
>> On Thu, Oct 08 2020 at 22:39, David Woodhouse wrote:
>> For the next submission, can you please
>> 
>>  - pick up the -ENODEV changes for HPET/IOAPIC which I posted earlier
>
> I think the world will be a nicer place if HPET and IOAPIC have their
> own struct device and their drivers can just use dev_get_msi_domain().
>
> The IRQ remapping drivers already plug into the device-add notifier and
> can fill in the appropriate MSI domain just like they do¹ for PCI and
> ACPI devices.
>
> Using platform_add_bundle() for HPET looks trivial enough; I'll have a
> play with that and then do IOAPIC too if/when the initialisation order
> and hotplug handling all works out OK to install the correct
> msi_domain.

Yes, I was wondering about that when I made PCI at least use that
mechanism, but had not had time to actually look at it.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-10 11:44                   ` Thomas Gleixner
@ 2020-10-10 11:58                     ` David Woodhouse
  2020-10-11 17:12                       ` Thomas Gleixner
  0 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-10 11:58 UTC (permalink / raw)
  To: Thomas Gleixner, x86; +Cc: kvm, Paolo Bonzini, linux-kernel



On 10 October 2020 12:44:10 BST, Thomas Gleixner <tglx@linutronix.de> wrote:
>On Sat, Oct 10 2020 at 11:06, David Woodhouse wrote:
>> On Fri, 2020-10-09 at 01:27 +0200, Thomas Gleixner wrote:
>>> On Thu, Oct 08 2020 at 22:39, David Woodhouse wrote:
>>> For the next submission, can you please
>>> 
>>>  - pick up the -ENODEV changes for HPET/IOAPIC which I posted
>earlier
>>
>> I think the world will be a nicer place if HPET and IOAPIC have their
>> own struct device and their drivers can just use
>dev_get_msi_domain().
>>
>> The IRQ remapping drivers already plug into the device-add notifier
>and
>> can fill in the appropriate MSI domain just like they do¹ for PCI and
>> ACPI devices.
>>
>> Using platform_add_bundle() for HPET looks trivial enough; I'll have
>a
>> play with that and then do IOAPIC too if/when the initialisation
>order
>> and hotplug handling all works out OK to install the correct
>> msi_domain.
>
>Yes, I was wondering about that when I made PCI at least use that
>mechanism, but had not had time to actually look at it.

Yeah. There's some muttering to be done for HPET about whether it's *its* MSI domain or whether it's the parent domain. But I'll have a play. I think we'll be able to drop the whole irq_remapping_get_irq_domain() thing.

Either way, it's a separate cleanup and the 15-bit APIC ID series I posted yesterday should be fine as it is. 

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-10 11:58                     ` David Woodhouse
@ 2020-10-11 17:12                       ` Thomas Gleixner
  2020-10-11 21:15                         ` David Woodhouse
  0 siblings, 1 reply; 192+ messages in thread
From: Thomas Gleixner @ 2020-10-11 17:12 UTC (permalink / raw)
  To: David Woodhouse, x86; +Cc: kvm, Paolo Bonzini, linux-kernel

On Sat, Oct 10 2020 at 12:58, David Woodhouse wrote:
> On 10 October 2020 12:44:10 BST, Thomas Gleixner <tglx@linutronix.de> wrote:
>>On Sat, Oct 10 2020 at 11:06, David Woodhouse wrote:

>>> The IRQ remapping drivers already plug into the device-add notifier
>>> and can fill in the appropriate MSI domain just like they do¹ for
>>> PCI and ACPI devices.
>>> Using platform_add_bundle() for HPET looks trivial enough; I'll have
>>> a play with that and then do IOAPIC too if/when the initialisation
>>> order and hotplug handling all works out OK to install the correct
>>> msi_domain.
>>
>> Yes, I was wondering about that when I made PCI at least use that
>> mechanism, but had not had time to actually look at it.
>
> Yeah. There's some muttering to be done for HPET about whether it's
> *its* MSI domain or whether it's the parent domain. But I'll have a
> play. I think we'll be able to drop the whole
> irq_remapping_get_irq_domain() thing.

That would be really nice.

> Either way, it's a separate cleanup and the 15-bit APIC ID series I
> posted yesterday should be fine as it is.

I go over it in the next days once more and stick it into my devel tree
until rc1. Need to get some conflicts sorted with that Device MSI stuff.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-11 17:12                       ` Thomas Gleixner
@ 2020-10-11 21:15                         ` David Woodhouse
  2020-10-12  9:33                           ` Thomas Gleixner
  0 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-11 21:15 UTC (permalink / raw)
  To: Thomas Gleixner, x86; +Cc: kvm, Paolo Bonzini, linux-kernel



On 11 October 2020 18:12:08 BST, Thomas Gleixner <tglx@linutronix.de> wrote:
>On Sat, Oct 10 2020 at 12:58, David Woodhouse wrote:
>> On 10 October 2020 12:44:10 BST, Thomas Gleixner <tglx@linutronix.de>
>wrote:
>>>On Sat, Oct 10 2020 at 11:06, David Woodhouse wrote:
>
>>>> The IRQ remapping drivers already plug into the device-add notifier
>>>> and can fill in the appropriate MSI domain just like they do¹ for
>>>> PCI and ACPI devices.
>>>> Using platform_add_bundle() for HPET looks trivial enough; I'll
>have
>>>> a play with that and then do IOAPIC too if/when the initialisation
>>>> order and hotplug handling all works out OK to install the correct
>>>> msi_domain.
>>>
>>> Yes, I was wondering about that when I made PCI at least use that
>>> mechanism, but had not had time to actually look at it.
>>
>> Yeah. There's some muttering to be done for HPET about whether it's
>> *its* MSI domain or whether it's the parent domain. But I'll have a
>> play. I think we'll be able to drop the whole
>> irq_remapping_get_irq_domain() thing.
>
>That would be really nice.

I can make it work for HPET if I fix up the point at which the IRQ remapping code registers a notifier on the platform bus. (At IRQ remap setup time is too early; when it registers the PCI bus notifier is too late.)

IOAPIC is harder though as the platform bus doesn't even exist that early. Maybe an early platform bus is possible but it would have to turn out particularly simple to do, or I'd need to find another use case too, to justify it. Will continue to play....

>> Either way, it's a separate cleanup and the 15-bit APIC ID series I
>> posted yesterday should be fine as it is.
>
>I go over it in the next days once more and stick it into my devel tree
>until rc1. Need to get some conflicts sorted with that Device MSI
>stuff.

While playing with HPET I noticed I need s/CONFIG_PCI_MSI/CONFIG_IRQ_GENERIC_MSI/ where the variables are declared at the top of msi.c to match the change I made later on. Can post v3 of the series or you can silently fix it up as you go; please advise.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-11 21:15                         ` David Woodhouse
@ 2020-10-12  9:33                           ` Thomas Gleixner
  2020-10-12 16:06                             ` David Woodhouse
  0 siblings, 1 reply; 192+ messages in thread
From: Thomas Gleixner @ 2020-10-12  9:33 UTC (permalink / raw)
  To: David Woodhouse, x86; +Cc: kvm, Paolo Bonzini, linux-kernel

On Sun, Oct 11 2020 at 22:15, David Woodhouse wrote:
> On 11 October 2020 18:12:08 BST, Thomas Gleixner <tglx@linutronix.de> wrote:
>> On Sat, Oct 10 2020 at 12:58, David Woodhouse wrote:
>>> On 10 October 2020 12:44:10 BST, Thomas Gleixner <tglx@linutronix.de>
>> wrote:
>>> Yeah. There's some muttering to be done for HPET about whether it's
>>> *its* MSI domain or whether it's the parent domain. But I'll have a
>>> play. I think we'll be able to drop the whole
>>> irq_remapping_get_irq_domain() thing.
>>
>> That would be really nice.
>
> I can make it work for HPET if I fix up the point at which the IRQ
> remapping code registers a notifier on the platform bus. (At IRQ remap
> setup time is too early; when it registers the PCI bus notifier is too
> late.)
>
> IOAPIC is harder though as the platform bus doesn't even exist that
> early. Maybe an early platform bus is possible but it would have to
> turn out particularly simple to do, or I'd need to find another use
> case too, to justify it. Will continue to play....

You might want to look into using irq_find_matching_fwspec() instead for
both HPET and IOAPIC. That needs a select() callback implemented in the
remapping domains.

>> I go over it in the next days once more and stick it into my devel tree
>> until rc1. Need to get some conflicts sorted with that Device MSI
>> stuff.
>
> While playing with HPET I noticed I need
> s/CONFIG_PCI_MSI/CONFIG_IRQ_GENERIC_MSI/ where the variables are
> declared at the top of msi.c to match the change I made later on. Can
> post v3 of the series or you can silently fix it up as you go; please
> advise.

I think I might be able to handle that on my own :)

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-12  9:33                           ` Thomas Gleixner
@ 2020-10-12 16:06                             ` David Woodhouse
  2020-10-12 18:38                               ` Thomas Gleixner
  0 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-12 16:06 UTC (permalink / raw)
  To: Thomas Gleixner, x86, Marc Zyngier; +Cc: kvm, Paolo Bonzini, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 17703 bytes --]

On Mon, 2020-10-12 at 11:33 +0200, Thomas Gleixner wrote:
> On Sun, Oct 11 2020 at 22:15, David Woodhouse wrote:
> > On 11 October 2020 18:12:08 BST, Thomas Gleixner <tglx@linutronix.de> wrote:
> > > On Sat, Oct 10 2020 at 12:58, David Woodhouse wrote:
> > > > On 10 October 2020 12:44:10 BST, Thomas Gleixner <tglx@linutronix.de>
> > > 
> > > wrote:
> > > > Yeah. There's some muttering to be done for HPET about whether it's
> > > > *its* MSI domain or whether it's the parent domain. But I'll have a
> > > > play. I think we'll be able to drop the whole
> > > > irq_remapping_get_irq_domain() thing.
> > > 
> > > That would be really nice.
> > 
> > I can make it work for HPET if I fix up the point at which the IRQ
> > remapping code registers a notifier on the platform bus. (At IRQ remap
> > setup time is too early; when it registers the PCI bus notifier is too
> > late.)
> > 
> > IOAPIC is harder though as the platform bus doesn't even exist that
> > early. Maybe an early platform bus is possible but it would have to
> > turn out particularly simple to do, or I'd need to find another use
> > case too, to justify it. Will continue to play....
> 
> You might want to look into using irq_find_matching_fwspec() instead for
> both HPET and IOAPIC. That needs a select() callback implemented in the
> remapping domains.

That works. Pushed (along with that trivial HPET MSI fixup) to 
https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/ext_dest_id

Briefly tested with I/OAPIC with and without Intel remapping in qemu,
not tested for HPET because I haven't worked out how to get qemu to do
HPET-MSI yet.

David Woodhouse (8):
      genirq/irqdomain: Implement get_name() method on irqchip fwnodes
      x86/apic: Add select() method on vector irqdomain
      iommu/amd: Implement select() method on remapping irqdomain
      iommu/vt-d: Implement select() method on remapping irqdomain
      iommu/hyper-v: Implement select() method on remapping irqdomain
      x86/hpet: Use irq_find_matching_fwspec() to find remapping irqdomain
      x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain
      x86: Kill all traces of irq_remapping_get_irq_domain()

 arch/x86/include/asm/hw_irq.h        |  2 --
 arch/x86/include/asm/irq_remapping.h |  9 ------
 arch/x86/include/asm/irqdomain.h     |  3 ++
 arch/x86/kernel/apic/io_apic.c       | 21 ++++++--------
 arch/x86/kernel/apic/vector.c        | 52 +++++++++++++++++++++++++++++++++++
 arch/x86/kernel/hpet.c               | 23 +++++++++-------
 drivers/iommu/amd/iommu.c            | 53 +++++++++++++-----------------------
 drivers/iommu/hyperv-iommu.c         | 18 ++++++------
 drivers/iommu/intel/irq_remapping.c  | 30 +++++++++-----------
 drivers/iommu/irq_remapping.c        | 14 ----------
 drivers/iommu/irq_remapping.h        |  3 --
 kernel/irq/irqdomain.c               | 11 +++++++-
 12 files changed, 128 insertions(+), 111 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index aabd8f1b6bb0..aef795f17478 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -40,8 +40,6 @@ enum irq_alloc_type {
 	X86_IRQ_ALLOC_TYPE_PCI_MSIX,
 	X86_IRQ_ALLOC_TYPE_DMAR,
 	X86_IRQ_ALLOC_TYPE_UV,
-	X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT,
-	X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT,
 };
 
 struct ioapic_alloc_info {
diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index af4a151d70b3..7cc49432187f 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -44,9 +44,6 @@ extern int irq_remapping_reenable(int);
 extern int irq_remap_enable_fault_handling(void);
 extern void panic_if_irq_remap(const char *msg);
 
-extern struct irq_domain *
-irq_remapping_get_irq_domain(struct irq_alloc_info *info);
-
 /* Create PCI MSI/MSIx irqdomain, use @parent as the parent irqdomain. */
 extern struct irq_domain *
 arch_create_remap_msi_irq_domain(struct irq_domain *par, const char *n, int id);
@@ -71,11 +68,5 @@ static inline void panic_if_irq_remap(const char *msg)
 {
 }
 
-static inline struct irq_domain *
-irq_remapping_get_irq_domain(struct irq_alloc_info *info)
-{
-	return NULL;
-}
-
 #endif /* CONFIG_IRQ_REMAP */
 #endif /* __X86_IRQ_REMAPPING_H */
diff --git a/arch/x86/include/asm/irqdomain.h b/arch/x86/include/asm/irqdomain.h
index cd684d45cb5f..2fc85f523ace 100644
--- a/arch/x86/include/asm/irqdomain.h
+++ b/arch/x86/include/asm/irqdomain.h
@@ -12,6 +12,9 @@ enum {
 	X86_IRQ_ALLOC_LEGACY				= 0x2,
 };
 
+int x86_fwspec_is_ioapic(struct irq_fwspec *fwspec);
+int x86_fwspec_is_hpet(struct irq_fwspec *fwspec);
+
 extern struct irq_domain *x86_vector_domain;
 
 extern void init_irq_alloc_info(struct irq_alloc_info *info,
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index ca2da19d5c55..f6a5b9f887aa 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2305,36 +2305,33 @@ static inline void __init check_timer(void)
 
 static int mp_irqdomain_create(int ioapic)
 {
-	struct irq_alloc_info info;
 	struct irq_domain *parent;
 	int hwirqs = mp_ioapic_pin_count(ioapic);
 	struct ioapic *ip = &ioapics[ioapic];
 	struct ioapic_domain_cfg *cfg = &ip->irqdomain_cfg;
 	struct mp_ioapic_gsi *gsi_cfg = mp_ioapic_gsi_routing(ioapic);
 	struct fwnode_handle *fn;
-	char *name = "IO-APIC";
+	struct irq_fwspec fwspec;
 
 	if (cfg->type == IOAPIC_DOMAIN_INVALID)
 		return 0;
 
-	init_irq_alloc_info(&info, NULL);
-	info.type = X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT;
-	info.devid = mpc_ioapic_id(ioapic);
-	parent = irq_remapping_get_irq_domain(&info);
-	if (!parent)
-		parent = x86_vector_domain;
-	else
-		name = "IO-APIC-IR";
-
 	/* Handle device tree enumerated APICs proper */
 	if (cfg->dev) {
 		fn = of_node_to_fwnode(cfg->dev);
 	} else {
-		fn = irq_domain_alloc_named_id_fwnode(name, ioapic);
+		fn = irq_domain_alloc_named_id_fwnode("IO-APIC", ioapic);
 		if (!fn)
 			return -ENOMEM;
 	}
 
+	fwspec.fwnode = fn;
+	fwspec.param_count = 1;
+	fwspec.param[0] = ioapic;
+	parent = irq_find_matching_fwspec(&fwspec, DOMAIN_BUS_ANY);
+	if (!parent)
+		return -ENODEV;
+
 	ip->irqdomain = irq_domain_create_linear(fn, hwirqs, cfg->ops,
 						 (void *)(long)ioapic);
 
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index bb2e2a2488a5..3f0485c12b13 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -636,7 +636,59 @@ static void x86_vector_debug_show(struct seq_file *m, struct irq_domain *d,
 }
 #endif
 
+
+int x86_fwspec_is_ioapic(struct irq_fwspec *fwspec)
+{
+	if (fwspec->param_count != 1)
+		return 0;
+
+	if (is_fwnode_irqchip(fwspec->fwnode)){
+		const char *fwname = fwnode_get_name(fwspec->fwnode);
+
+		if (!strncmp(fwname, "IO-APIC-", 8) &&
+		    simple_strtol(fwname+8, NULL, 10) == fwspec->param[0])
+			return 1;
+#ifdef CONFIG_OF
+	} else if (to_of_node(fwspec->fwnode) &&
+		   of_device_is_compatible(to_of_node(fwspec->fwnode),
+					   "intel,ce4100-ioapic")) {
+		return 1;
+#endif
+	}
+	return 0;
+}
+
+int x86_fwspec_is_hpet(struct irq_fwspec *fwspec)
+{
+	if (fwspec->param_count != 1)
+		return 0;
+
+	if (is_fwnode_irqchip(fwspec->fwnode)){
+		const char *fwname = fwnode_get_name(fwspec->fwnode);
+
+		if (!strncmp(fwname, "HPET-MSI-", 9) &&
+		    simple_strtol(fwname+9, NULL, 10) == fwspec->param[0])
+			return 1;
+	}
+	return 0;
+}
+
+static int x86_vector_select(struct irq_domain *d, struct irq_fwspec *fwspec,
+			     enum irq_domain_bus_token bus_token)
+{
+	/*
+	 * HPET and I/OAPIC cannot be parented in the vector domain
+	 * if IRQ remapping is enabled. APIC IDs above 15 bits are
+	 * only permitted if IRQ remapping is enabled, so check that.
+	 */
+	if (apic->apic_id_valid(32768))
+		return 0;
+
+	return x86_fwspec_is_ioapic(fwspec) || x86_fwspec_is_hpet(fwspec);
+}
+
 static const struct irq_domain_ops x86_vector_domain_ops = {
+	.select		= x86_vector_select,
 	.alloc		= x86_vector_alloc_irqs,
 	.free		= x86_vector_free_irqs,
 	.activate	= x86_vector_activate,
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 3b8b12769f3b..fb7736ca7b5b 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -543,8 +543,8 @@ static struct irq_domain *hpet_create_irq_domain(int hpet_id)
 {
 	struct msi_domain_info *domain_info;
 	struct irq_domain *parent, *d;
-	struct irq_alloc_info info;
 	struct fwnode_handle *fn;
+	struct irq_fwspec fwspec;
 
 	if (x86_vector_domain == NULL)
 		return NULL;
@@ -556,15 +556,6 @@ static struct irq_domain *hpet_create_irq_domain(int hpet_id)
 	*domain_info = hpet_msi_domain_info;
 	domain_info->data = (void *)(long)hpet_id;
 
-	init_irq_alloc_info(&info, NULL);
-	info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
-	info.devid = hpet_id;
-	parent = irq_remapping_get_irq_domain(&info);
-	if (parent == NULL)
-		parent = x86_vector_domain;
-	else
-		hpet_msi_controller.name = "IR-HPET-MSI";
-
 	fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
 					      hpet_id);
 	if (!fn) {
@@ -572,6 +563,18 @@ static struct irq_domain *hpet_create_irq_domain(int hpet_id)
 		return NULL;
 	}
 
+	fwspec.fwnode = fn;
+	fwspec.param_count = 1;
+	fwspec.param[0] = hpet_id;
+	parent = irq_find_matching_fwspec(&fwspec, DOMAIN_BUS_ANY);
+	if (!parent) {
+		irq_domain_free_fwnode(fn);
+		kfree(domain_info);
+		return NULL;
+	}
+	if (parent != x86_vector_domain)
+		hpet_msi_controller.name = "IR-HPET-MSI";
+
 	d = msi_create_irq_domain(fn, domain_info, parent);
 	if (!d) {
 		irq_domain_free_fwnode(fn);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 13d0a8f42d56..16adbf9d8fbb 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3536,10 +3536,8 @@ static int get_devid(struct irq_alloc_info *info)
 {
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC:
-	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
 		return get_ioapic_devid(info->devid);
 	case X86_IRQ_ALLOC_TYPE_HPET:
-	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
 		return get_hpet_devid(info->devid);
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
@@ -3550,46 +3548,32 @@ static int get_devid(struct irq_alloc_info *info)
 	}
 }
 
-static struct irq_domain *get_irq_domain_for_devid(struct irq_alloc_info *info,
-						   int devid)
-{
-	struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
-
-	if (!iommu)
-		return NULL;
-
-	switch (info->type) {
-	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
-	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
-		return iommu->ir_domain;
-	default:
-		WARN_ON_ONCE(1);
-		return NULL;
-	}
-}
-
-static struct irq_domain *get_irq_domain(struct irq_alloc_info *info)
-{
-	int devid;
-
-	if (!info)
-		return NULL;
-
-	devid = get_devid(info);
-	if (devid < 0)
-		return NULL;
-	return get_irq_domain_for_devid(info, devid);
-}
-
 struct irq_remap_ops amd_iommu_irq_ops = {
 	.prepare		= amd_iommu_prepare,
 	.enable			= amd_iommu_enable,
 	.disable		= amd_iommu_disable,
 	.reenable		= amd_iommu_reenable,
 	.enable_faulting	= amd_iommu_enable_faulting,
-	.get_irq_domain		= get_irq_domain,
 };
 
+static int irq_remapping_select(struct irq_domain *d, struct irq_fwspec *fwspec,
+				enum irq_domain_bus_token bus_token)
+{
+	struct amd_iommu *iommu;
+	int devid = -1;
+
+	if (x86_fwspec_is_ioapic(fwspec))
+		devid = get_ioapic_devid(fwspec->param[0]);
+	else if (x86_fwspec_is_ioapic(fwspec))
+		devid = get_hpet_devid(fwspec->param[0]);
+
+	if (devid < 0)
+		return 0;
+
+	iommu = amd_iommu_rlookup_table[devid];
+	return iommu && iommu->ir_domain == d;
+}
+
 static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 				       struct irq_cfg *irq_cfg,
 				       struct irq_alloc_info *info,
@@ -3813,6 +3797,7 @@ static void irq_remapping_deactivate(struct irq_domain *domain,
 }
 
 static const struct irq_domain_ops amd_ir_domain_ops = {
+	.select = irq_remapping_select,
 	.alloc = irq_remapping_alloc,
 	.free = irq_remapping_free,
 	.activate = irq_remapping_activate,
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 37dd485a5640..e7ed2bb83358 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -61,6 +61,14 @@ static struct irq_chip hyperv_ir_chip = {
 	.irq_set_affinity	= hyperv_ir_set_affinity,
 };
 
+static int hyperv_irq_remapping_select(struct irq_domain *d,
+				       struct irq_fwspec *fwspec,
+				       enum irq_domain_bus_token bus_token)
+{
+	/* Claim only the first (and only) I/OAPIC */
+	return x86_fwspec_is_ioapic(fwspec) && fwspec->param[0] == 0;
+}
+
 static int hyperv_irq_remapping_alloc(struct irq_domain *domain,
 				     unsigned int virq, unsigned int nr_irqs,
 				     void *arg)
@@ -102,6 +110,7 @@ static void hyperv_irq_remapping_free(struct irq_domain *domain,
 }
 
 static const struct irq_domain_ops hyperv_ir_domain_ops = {
+	.select = hyperv_irq_remapping_select,
 	.alloc = hyperv_irq_remapping_alloc,
 	.free = hyperv_irq_remapping_free,
 };
@@ -151,18 +160,9 @@ static int __init hyperv_enable_irq_remapping(void)
 	return IRQ_REMAP_X2APIC_MODE;
 }
 
-static struct irq_domain *hyperv_get_irq_domain(struct irq_alloc_info *info)
-{
-	if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT)
-		return ioapic_ir_domain;
-	else
-		return NULL;
-}
-
 struct irq_remap_ops hyperv_irq_remap_ops = {
 	.prepare		= hyperv_prepare_irq_remapping,
 	.enable			= hyperv_enable_irq_remapping,
-	.get_irq_domain		= hyperv_get_irq_domain,
 };
 
 #endif
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 511dfb4884bc..ccf61cd18f69 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1128,29 +1128,12 @@ static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
 	irte->redir_hint = 1;
 }
 
-static struct irq_domain *intel_get_irq_domain(struct irq_alloc_info *info)
-{
-	if (!info)
-		return NULL;
-
-	switch (info->type) {
-	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
-		return map_ioapic_to_ir(info->devid);
-	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
-		return map_hpet_to_ir(info->devid);
-	default:
-		WARN_ON_ONCE(1);
-		return NULL;
-	}
-}
-
 struct irq_remap_ops intel_irq_remap_ops = {
 	.prepare		= intel_prepare_irq_remapping,
 	.enable			= intel_enable_irq_remapping,
 	.disable		= disable_irq_remapping,
 	.reenable		= reenable_irq_remapping,
 	.enable_faulting	= enable_drhd_fault_handling,
-	.get_irq_domain		= intel_get_irq_domain,
 };
 
 static void intel_ir_reconfigure_irte(struct irq_data *irqd, bool force)
@@ -1435,7 +1418,20 @@ static void intel_irq_remapping_deactivate(struct irq_domain *domain,
 	modify_irte(&data->irq_2_iommu, &entry);
 }
 
+static int intel_irq_remapping_select(struct irq_domain *d,
+				      struct irq_fwspec *fwspec,
+				      enum irq_domain_bus_token bus_token)
+{
+	if (x86_fwspec_is_ioapic(fwspec))
+		return d == map_ioapic_to_ir(fwspec->param[0]);
+	else if (x86_fwspec_is_hpet(fwspec))
+		return d == map_hpet_to_ir(fwspec->param[0]);
+
+	return 0;
+}
+
 static const struct irq_domain_ops intel_ir_domain_ops = {
+	.select = intel_irq_remapping_select,
 	.alloc = intel_irq_remapping_alloc,
 	.free = intel_irq_remapping_free,
 	.activate = intel_irq_remapping_activate,
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index 2d84b1ed205e..83314b9d8f38 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -158,17 +158,3 @@ void panic_if_irq_remap(const char *msg)
 	if (irq_remapping_enabled)
 		panic(msg);
 }
-
-/**
- * irq_remapping_get_irq_domain - Get the irqdomain serving the request @info
- * @info: interrupt allocation information, used to identify the IOMMU device
- *
- * Returns pointer to IRQ domain, or NULL on failure.
- */
-struct irq_domain *irq_remapping_get_irq_domain(struct irq_alloc_info *info)
-{
-	if (!remap_ops || !remap_ops->get_irq_domain)
-		return NULL;
-
-	return remap_ops->get_irq_domain(info);
-}
diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
index 1661b3d75920..8c89cb947cdb 100644
--- a/drivers/iommu/irq_remapping.h
+++ b/drivers/iommu/irq_remapping.h
@@ -42,9 +42,6 @@ struct irq_remap_ops {
 
 	/* Enable fault handling */
 	int  (*enable_faulting)(void);
-
-	/* Get the irqdomain associated to IOMMU device */
-	struct irq_domain *(*get_irq_domain)(struct irq_alloc_info *);
 };
 
 extern struct irq_remap_ops intel_irq_remap_ops;
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 76cd7ebd1178..6440f97c412e 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -42,7 +42,16 @@ static inline void debugfs_add_domain_dir(struct irq_domain *d) { }
 static inline void debugfs_remove_domain_dir(struct irq_domain *d) { }
 #endif
 
-const struct fwnode_operations irqchip_fwnode_ops;
+static const char *irqchip_fwnode_get_name(const struct fwnode_handle *fwnode)
+{
+	struct irqchip_fwid *fwid = container_of(fwnode, struct irqchip_fwid, fwnode);
+
+	return fwid->name;
+}
+
+const struct fwnode_operations irqchip_fwnode_ops = {
+	.get_name = irqchip_fwnode_get_name,
+};
 EXPORT_SYMBOL_GPL(irqchip_fwnode_ops);
 
 /**

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-12 16:06                             ` David Woodhouse
@ 2020-10-12 18:38                               ` Thomas Gleixner
  2020-10-12 20:20                                 ` David Woodhouse
  0 siblings, 1 reply; 192+ messages in thread
From: Thomas Gleixner @ 2020-10-12 18:38 UTC (permalink / raw)
  To: David Woodhouse, x86, Marc Zyngier; +Cc: kvm, Paolo Bonzini, linux-kernel

On Mon, Oct 12 2020 at 17:06, David Woodhouse wrote:
> On Mon, 2020-10-12 at 11:33 +0200, Thomas Gleixner wrote:
>> You might want to look into using irq_find_matching_fwspec() instead for
>> both HPET and IOAPIC. That needs a select() callback implemented in the
>> remapping domains.
>
> That works.

:)

Nasty, but way better than what we have now. Now I start to wonder
whether we can get rid of a few other things as well especially the
non-standard alloc_info thingy.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-12 18:38                               ` Thomas Gleixner
@ 2020-10-12 20:20                                 ` David Woodhouse
  2020-10-12 22:13                                   ` Thomas Gleixner
  0 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-12 20:20 UTC (permalink / raw)
  To: Thomas Gleixner, x86, Marc Zyngier; +Cc: kvm, Paolo Bonzini, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 788 bytes --]

On Mon, 2020-10-12 at 20:38 +0200, Thomas Gleixner wrote:
> On Mon, Oct 12 2020 at 17:06, David Woodhouse wrote:
> > On Mon, 2020-10-12 at 11:33 +0200, Thomas Gleixner wrote:
> > > You might want to look into using irq_find_matching_fwspec()
> > > instead for
> > > both HPET and IOAPIC. That needs a select() callback implemented
> > > in the
> > > remapping domains.
> > 
> > That works.
> 
> :)
> 
> Nasty, but way better than what we have now. 

Want me to send that out in email or is the git tree enough for now?

I've cleaned it up a little and fixed a bug in the I/OAPIC error path.

Still not entirely convinced about the apic->apic_id_valid(32768) thing
but it should work well enough, and doesn't require exporting any extra
state from apic.c that way.


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-12 20:20                                 ` David Woodhouse
@ 2020-10-12 22:13                                   ` Thomas Gleixner
  2020-10-13  7:52                                     ` David Woodhouse
  0 siblings, 1 reply; 192+ messages in thread
From: Thomas Gleixner @ 2020-10-12 22:13 UTC (permalink / raw)
  To: David Woodhouse, x86, Marc Zyngier; +Cc: kvm, Paolo Bonzini, linux-kernel

On Mon, Oct 12 2020 at 21:20, David Woodhouse wrote:
> On Mon, 2020-10-12 at 20:38 +0200, Thomas Gleixner wrote:
>> Nasty, but way better than what we have now. 
>
> Want me to send that out in email or is the git tree enough for now?
>
> I've cleaned it up a little and fixed a bug in the I/OAPIC error path.

Mail would be nice once you are confident with the pile.

> Still not entirely convinced about the apic->apic_id_valid(32768) thing
> but it should work well enough, and doesn't require exporting any extra
> state from apic.c that way.

Yeah, that part is odd.

I really dislike the way how irq_find_matching_fwspec() works. The 'rc'
value is actually boolean despite being type 'int' and if 'rc' is not 0
then it returns the domain even if 'rc' is an error code.

But that does not allow to return error codes from a domain match() /
select() callback which is what we really want to express that there is
something fishy.

Something like the below perhaps? Needs more thought obviously.

Marc, any opinion ?

Thanks,

        tglx

8<-------------

arch/x86/kernel/apic/vector.c |   18 ++++++++++++++++++
 include/linux/irqdomain.h     |    1 +
 kernel/irq/irqdomain.c        |    2 +-
 3 files changed, 20 insertions(+), 1 deletion(-)

--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -593,6 +593,24 @@ static int x86_vector_alloc_irqs(struct
 	return err;
 }
 
+struct irq_domain *x86_select_parent_domain(struct irq_fwspec *fwspec)
+{
+	struct irq_domain *dom;
+
+	/*
+	 * If Interrupt Remapping is enabled then the match function
+	 * returns either the remapping domain or an error code if the
+	 * device is not registered with the remapping unit.
+	 *
+	 * If remapping is not enabled, then the function returns NULL.
+	 */
+	dom = irq_find_matching_fwspec(fwspec, DOMAIN_BUS_IR);
+	if (dom)
+		return IS_ERR(dom) ? NULL : dom;
+
+	return x86_vector_domain;
+}
+
 #ifdef CONFIG_GENERIC_IRQ_DEBUGFS
 static void x86_vector_debug_show(struct seq_file *m, struct irq_domain *d,
 				  struct irq_data *irqd, int ind)
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -85,6 +85,7 @@ enum irq_domain_bus_token {
 	DOMAIN_BUS_TI_SCI_INTA_MSI,
 	DOMAIN_BUS_WAKEUP,
 	DOMAIN_BUS_VMD_MSI,
+	DOMAIN_BUS_IR,
 };
 
 /**
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -395,7 +395,7 @@ struct irq_domain *irq_find_matching_fws
 			       (h->bus_token == bus_token)));
 
 		if (rc) {
-			found = h;
+			found = rc < 0 ? ERR_PTR(rc) : h;
 			break;
 		}
 	}




^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-12 22:13                                   ` Thomas Gleixner
@ 2020-10-13  7:52                                     ` David Woodhouse
  2020-10-13  8:11                                       ` [PATCH 0/9] Remove irq_remapping_get_irq_domain() David Woodhouse
  2020-10-13  9:28                                       ` [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID Thomas Gleixner
  0 siblings, 2 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-13  7:52 UTC (permalink / raw)
  To: Thomas Gleixner, x86, Marc Zyngier; +Cc: kvm, Paolo Bonzini, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2046 bytes --]

On Tue, 2020-10-13 at 00:13 +0200, Thomas Gleixner wrote:
> On Mon, Oct 12 2020 at 21:20, David Woodhouse wrote:
> > On Mon, 2020-10-12 at 20:38 +0200, Thomas Gleixner wrote:
> > > Nasty, but way better than what we have now. 
> > 
> > Want me to send that out in email or is the git tree enough for now?
> > 
> > I've cleaned it up a little and fixed a bug in the I/OAPIC error path.
> 
> Mail would be nice once you are confident with the pile.

After the next cup of coffee. Will send it as an incremental series on
top of my previous ext_dest_id series.

> > Still not entirely convinced about the apic->apic_id_valid(32768) thing
> > but it should work well enough, and doesn't require exporting any extra
> > state from apic.c that way.
> 
> Yeah, that part is odd.
> 
> I really dislike the way how irq_find_matching_fwspec() works. The 'rc'
> value is actually boolean despite being type 'int' and if 'rc' is not 0
> then it returns the domain even if 'rc' is an error code.
> 
> But that does not allow to return error codes from a domain match() /
> select() callback which is what we really want to express that there is
> something fishy.

I don't know that we need to return an explicit error. I made
x86_vector_select() refuse to match if IRQ remapping is enabled, which
means that the irq_find_matching_fwspec() call will fail by returning
NULL in precisely the cases it should. (Which should never happen; qv).

That failure means HPET will refuse to set up MSI, or I/OAPIC will
refuse to initialise (later causing a BUG and a failure to boot, which
is probably the correct thing to do).

It's all fine.

+       dom = irq_find_matching_fwspec(fwspec, DOMAIN_BUS_IR);
+       if (dom)
+               return IS_ERR(dom) ? NULL : dom;
+
+       return x86_vector_domain;
+}

Ick. There's no need for that.

Eliminating that awful "if not found then slip the x86_vector_domain in
as a special case" was the whole *point* of using
irq_find_matching_fwspec() in the first place.



[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH 0/9] Remove irq_remapping_get_irq_domain()
  2020-10-13  7:52                                     ` David Woodhouse
@ 2020-10-13  8:11                                       ` David Woodhouse
  2020-10-13  8:11                                         ` [PATCH 1/9] genirq/irqdomain: Implement get_name() method on irqchip fwnodes David Woodhouse
                                                           ` (8 more replies)
  2020-10-13  9:28                                       ` [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID Thomas Gleixner
  1 sibling, 9 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-13  8:11 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz

I didn't much like the I/OAPIC and HPET drivers having magical knowledge
that they had to substitute x86_vector_domain if their call to
irq_remapping_get_irq_domain() returned NULL.

When Thomas tried to make it handle error returns from …get_irq_domain() 
distinctly from the NULL case too, it made me even sadder. So I killed 
it with fire.

Now they just use irq_find_matching_fwspec() to find an appropriate
irqdomain. Each remapping irqdomain just needs to say 'yep, that's me'
for the HPETs or I/OAPICs which are within their scope, while the
x86_vector_domain accepts them all but only if interrupt remapping
is *disabled*. No more special knowledge in the caller.

If IR is enabled and there's a child device which escapes the scope of
all remapping units, it gets NULL for its parent irqdomain and will
fail to initialise, which is the correct thing to do in that "should
never happen" case. For HPET that'll mean that it just doesn't support
MSI, while I/OAPIC will refuse to initialise and trigger a BUG_ON
because Linux quite likes it when *all* the I/OAPICs it knows about get
initialised successfully.

This is on top of the previous 'ext_dest_id' series at
https://patchwork.kernel.org/project/kvm/list/?series=362037

https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/ext_dest_id

David Woodhouse (9):
      genirq/irqdomain: Implement get_name() method on irqchip fwnodes
      x86/apic: Add select() method on vector irqdomain
      iommu/amd: Implement select() method on remapping irqdomain
      iommu/vt-d: Implement select() method on remapping irqdomain
      iommu/hyper-v: Implement select() method on remapping irqdomain
      x86/hpet: Use irq_find_matching_fwspec() to find remapping irqdomain
      x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain
      x86: Kill all traces of irq_remapping_get_irq_domain()
      iommu/vt-d: Simplify intel_irq_remapping_select()

 arch/x86/include/asm/hw_irq.h        |  2 --
 arch/x86/include/asm/irq_remapping.h |  9 ---------
 arch/x86/include/asm/irqdomain.h     |  3 +++
 arch/x86/kernel/apic/io_apic.c       | 24 ++++++++++++------------
 arch/x86/kernel/apic/vector.c        | 43 +++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/hpet.c               | 23 +++++++++++++----------
 drivers/iommu/amd/iommu.c            | 53 +++++++++++++++++++----------------------------------
 drivers/iommu/hyperv-iommu.c         | 18 +++++++++---------
 drivers/iommu/intel/irq_remapping.c  | 43 +++++++++++++++++--------------------------
 drivers/iommu/irq_remapping.c        | 14 --------------
 drivers/iommu/irq_remapping.h        |  3 ---
 kernel/irq/irqdomain.c               | 11 ++++++++++-
 12 files changed, 126 insertions(+), 120 deletions(-)




^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH 1/9] genirq/irqdomain: Implement get_name() method on irqchip fwnodes
  2020-10-13  8:11                                       ` [PATCH 0/9] Remove irq_remapping_get_irq_domain() David Woodhouse
@ 2020-10-13  8:11                                         ` David Woodhouse
  2020-10-13  8:11                                         ` [PATCH 2/9] x86/apic: Add select() method on vector irqdomain David Woodhouse
                                                           ` (7 subsequent siblings)
  8 siblings, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-13  8:11 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 kernel/irq/irqdomain.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 76cd7ebd1178..6440f97c412e 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -42,7 +42,16 @@ static inline void debugfs_add_domain_dir(struct irq_domain *d) { }
 static inline void debugfs_remove_domain_dir(struct irq_domain *d) { }
 #endif
 
-const struct fwnode_operations irqchip_fwnode_ops;
+static const char *irqchip_fwnode_get_name(const struct fwnode_handle *fwnode)
+{
+	struct irqchip_fwid *fwid = container_of(fwnode, struct irqchip_fwid, fwnode);
+
+	return fwid->name;
+}
+
+const struct fwnode_operations irqchip_fwnode_ops = {
+	.get_name = irqchip_fwnode_get_name,
+};
 EXPORT_SYMBOL_GPL(irqchip_fwnode_ops);
 
 /**
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH 2/9] x86/apic: Add select() method on vector irqdomain
  2020-10-13  8:11                                       ` [PATCH 0/9] Remove irq_remapping_get_irq_domain() David Woodhouse
  2020-10-13  8:11                                         ` [PATCH 1/9] genirq/irqdomain: Implement get_name() method on irqchip fwnodes David Woodhouse
@ 2020-10-13  8:11                                         ` David Woodhouse
  2020-10-13  8:11                                         ` [PATCH 3/9] iommu/amd: Implement select() method on remapping irqdomain David Woodhouse
                                                           ` (6 subsequent siblings)
  8 siblings, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-13  8:11 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz

From: David Woodhouse <dwmw@amazon.co.uk>

This will be used to select the irqdomain for I/OAPIC and HPET.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/irqdomain.h |  3 +++
 arch/x86/kernel/apic/vector.c    | 43 ++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/arch/x86/include/asm/irqdomain.h b/arch/x86/include/asm/irqdomain.h
index cd684d45cb5f..125c23b7bad3 100644
--- a/arch/x86/include/asm/irqdomain.h
+++ b/arch/x86/include/asm/irqdomain.h
@@ -12,6 +12,9 @@ enum {
 	X86_IRQ_ALLOC_LEGACY				= 0x2,
 };
 
+extern int x86_fwspec_is_ioapic(struct irq_fwspec *fwspec);
+extern int x86_fwspec_is_hpet(struct irq_fwspec *fwspec);
+
 extern struct irq_domain *x86_vector_domain;
 
 extern void init_irq_alloc_info(struct irq_alloc_info *info,
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index bb2e2a2488a5..b9b05caa28a4 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -636,7 +636,50 @@ static void x86_vector_debug_show(struct seq_file *m, struct irq_domain *d,
 }
 #endif
 
+int x86_fwspec_is_ioapic(struct irq_fwspec *fwspec)
+{
+	if (fwspec->param_count != 1)
+		return 0;
+
+	if (is_fwnode_irqchip(fwspec->fwnode)) {
+		const char *fwname = fwnode_get_name(fwspec->fwnode);
+		return fwname && !strncmp(fwname, "IO-APIC-", 8) &&
+			simple_strtol(fwname+8, NULL, 10) == fwspec->param[0];
+	}
+	return to_of_node(fwspec->fwnode) &&
+		of_device_is_compatible(to_of_node(fwspec->fwnode),
+					"intel,ce4100-ioapic");
+}
+
+int x86_fwspec_is_hpet(struct irq_fwspec *fwspec)
+{
+	if (fwspec->param_count != 1)
+		return 0;
+
+	if (is_fwnode_irqchip(fwspec->fwnode)) {
+		const char *fwname = fwnode_get_name(fwspec->fwnode);
+		return fwname && !strncmp(fwname, "HPET-MSI-", 9) &&
+			simple_strtol(fwname+9, NULL, 10) == fwspec->param[0];
+	}
+	return 0;
+}
+
+static int x86_vector_select(struct irq_domain *d, struct irq_fwspec *fwspec,
+			     enum irq_domain_bus_token bus_token)
+{
+	/*
+	 * HPET and I/OAPIC cannot be parented in the vector domain
+	 * if IRQ remapping is enabled. APIC IDs above 15 bits are
+	 * only permitted if IRQ remapping is enabled, so check that.
+	 */
+	if (apic->apic_id_valid(32768))
+		return 0;
+
+	return x86_fwspec_is_ioapic(fwspec) || x86_fwspec_is_hpet(fwspec);
+}
+
 static const struct irq_domain_ops x86_vector_domain_ops = {
+	.select		= x86_vector_select,
 	.alloc		= x86_vector_alloc_irqs,
 	.free		= x86_vector_free_irqs,
 	.activate	= x86_vector_activate,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH 3/9] iommu/amd: Implement select() method on remapping irqdomain
  2020-10-13  8:11                                       ` [PATCH 0/9] Remove irq_remapping_get_irq_domain() David Woodhouse
  2020-10-13  8:11                                         ` [PATCH 1/9] genirq/irqdomain: Implement get_name() method on irqchip fwnodes David Woodhouse
  2020-10-13  8:11                                         ` [PATCH 2/9] x86/apic: Add select() method on vector irqdomain David Woodhouse
@ 2020-10-13  8:11                                         ` David Woodhouse
  2020-10-13  8:11                                         ` [PATCH 4/9] iommu/vt-d: " David Woodhouse
                                                           ` (5 subsequent siblings)
  8 siblings, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-13  8:11 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 drivers/iommu/amd/iommu.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 13d0a8f42d56..7ecebc5d255f 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3590,6 +3590,24 @@ struct irq_remap_ops amd_iommu_irq_ops = {
 	.get_irq_domain		= get_irq_domain,
 };
 
+static int irq_remapping_select(struct irq_domain *d, struct irq_fwspec *fwspec,
+				enum irq_domain_bus_token bus_token)
+{
+	struct amd_iommu *iommu;
+	int devid = -1;
+
+	if (x86_fwspec_is_ioapic(fwspec))
+		devid = get_ioapic_devid(fwspec->param[0]);
+	else if (x86_fwspec_is_ioapic(fwspec))
+		devid = get_hpet_devid(fwspec->param[0]);
+
+	if (devid < 0)
+		return 0;
+
+	iommu = amd_iommu_rlookup_table[devid];
+	return iommu && iommu->ir_domain == d;
+}
+
 static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 				       struct irq_cfg *irq_cfg,
 				       struct irq_alloc_info *info,
@@ -3813,6 +3831,7 @@ static void irq_remapping_deactivate(struct irq_domain *domain,
 }
 
 static const struct irq_domain_ops amd_ir_domain_ops = {
+	.select = irq_remapping_select,
 	.alloc = irq_remapping_alloc,
 	.free = irq_remapping_free,
 	.activate = irq_remapping_activate,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH 4/9] iommu/vt-d: Implement select() method on remapping irqdomain
  2020-10-13  8:11                                       ` [PATCH 0/9] Remove irq_remapping_get_irq_domain() David Woodhouse
                                                           ` (2 preceding siblings ...)
  2020-10-13  8:11                                         ` [PATCH 3/9] iommu/amd: Implement select() method on remapping irqdomain David Woodhouse
@ 2020-10-13  8:11                                         ` David Woodhouse
  2020-10-13  8:11                                         ` [PATCH 5/9] iommu/hyper-v: " David Woodhouse
                                                           ` (4 subsequent siblings)
  8 siblings, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-13  8:11 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 drivers/iommu/intel/irq_remapping.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 511dfb4884bc..40c2fec122b8 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1435,7 +1435,20 @@ static void intel_irq_remapping_deactivate(struct irq_domain *domain,
 	modify_irte(&data->irq_2_iommu, &entry);
 }
 
+static int intel_irq_remapping_select(struct irq_domain *d,
+				      struct irq_fwspec *fwspec,
+				      enum irq_domain_bus_token bus_token)
+{
+	if (x86_fwspec_is_ioapic(fwspec))
+		return d == map_ioapic_to_ir(fwspec->param[0]);
+	else if (x86_fwspec_is_hpet(fwspec))
+		return d == map_hpet_to_ir(fwspec->param[0]);
+
+	return 0;
+}
+
 static const struct irq_domain_ops intel_ir_domain_ops = {
+	.select = intel_irq_remapping_select,
 	.alloc = intel_irq_remapping_alloc,
 	.free = intel_irq_remapping_free,
 	.activate = intel_irq_remapping_activate,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH 5/9] iommu/hyper-v: Implement select() method on remapping irqdomain
  2020-10-13  8:11                                       ` [PATCH 0/9] Remove irq_remapping_get_irq_domain() David Woodhouse
                                                           ` (3 preceding siblings ...)
  2020-10-13  8:11                                         ` [PATCH 4/9] iommu/vt-d: " David Woodhouse
@ 2020-10-13  8:11                                         ` David Woodhouse
  2020-10-13  8:11                                         ` [PATCH 6/9] x86/hpet: Use irq_find_matching_fwspec() to find " David Woodhouse
                                                           ` (3 subsequent siblings)
  8 siblings, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-13  8:11 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 drivers/iommu/hyperv-iommu.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 37dd485a5640..6a8966fbc3bd 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -61,6 +61,14 @@ static struct irq_chip hyperv_ir_chip = {
 	.irq_set_affinity	= hyperv_ir_set_affinity,
 };
 
+static int hyperv_irq_remapping_select(struct irq_domain *d,
+				       struct irq_fwspec *fwspec,
+				       enum irq_domain_bus_token bus_token)
+{
+	/* Claim only the first (and only) I/OAPIC */
+	return x86_fwspec_is_ioapic(fwspec) && fwspec->param[0] == 0;
+}
+
 static int hyperv_irq_remapping_alloc(struct irq_domain *domain,
 				     unsigned int virq, unsigned int nr_irqs,
 				     void *arg)
@@ -102,6 +110,7 @@ static void hyperv_irq_remapping_free(struct irq_domain *domain,
 }
 
 static const struct irq_domain_ops hyperv_ir_domain_ops = {
+	.select = hyperv_irq_remapping_select,
 	.alloc = hyperv_irq_remapping_alloc,
 	.free = hyperv_irq_remapping_free,
 };
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH 6/9] x86/hpet: Use irq_find_matching_fwspec() to find remapping irqdomain
  2020-10-13  8:11                                       ` [PATCH 0/9] Remove irq_remapping_get_irq_domain() David Woodhouse
                                                           ` (4 preceding siblings ...)
  2020-10-13  8:11                                         ` [PATCH 5/9] iommu/hyper-v: " David Woodhouse
@ 2020-10-13  8:11                                         ` David Woodhouse
  2020-10-13  8:11                                         ` [PATCH 7/9] x86/ioapic: " David Woodhouse
                                                           ` (2 subsequent siblings)
  8 siblings, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-13  8:11 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kernel/hpet.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 3b8b12769f3b..fb7736ca7b5b 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -543,8 +543,8 @@ static struct irq_domain *hpet_create_irq_domain(int hpet_id)
 {
 	struct msi_domain_info *domain_info;
 	struct irq_domain *parent, *d;
-	struct irq_alloc_info info;
 	struct fwnode_handle *fn;
+	struct irq_fwspec fwspec;
 
 	if (x86_vector_domain == NULL)
 		return NULL;
@@ -556,15 +556,6 @@ static struct irq_domain *hpet_create_irq_domain(int hpet_id)
 	*domain_info = hpet_msi_domain_info;
 	domain_info->data = (void *)(long)hpet_id;
 
-	init_irq_alloc_info(&info, NULL);
-	info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
-	info.devid = hpet_id;
-	parent = irq_remapping_get_irq_domain(&info);
-	if (parent == NULL)
-		parent = x86_vector_domain;
-	else
-		hpet_msi_controller.name = "IR-HPET-MSI";
-
 	fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
 					      hpet_id);
 	if (!fn) {
@@ -572,6 +563,18 @@ static struct irq_domain *hpet_create_irq_domain(int hpet_id)
 		return NULL;
 	}
 
+	fwspec.fwnode = fn;
+	fwspec.param_count = 1;
+	fwspec.param[0] = hpet_id;
+	parent = irq_find_matching_fwspec(&fwspec, DOMAIN_BUS_ANY);
+	if (!parent) {
+		irq_domain_free_fwnode(fn);
+		kfree(domain_info);
+		return NULL;
+	}
+	if (parent != x86_vector_domain)
+		hpet_msi_controller.name = "IR-HPET-MSI";
+
 	d = msi_create_irq_domain(fn, domain_info, parent);
 	if (!d) {
 		irq_domain_free_fwnode(fn);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH 7/9] x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain
  2020-10-13  8:11                                       ` [PATCH 0/9] Remove irq_remapping_get_irq_domain() David Woodhouse
                                                           ` (5 preceding siblings ...)
  2020-10-13  8:11                                         ` [PATCH 6/9] x86/hpet: Use irq_find_matching_fwspec() to find " David Woodhouse
@ 2020-10-13  8:11                                         ` David Woodhouse
  2020-10-13  8:11                                         ` [PATCH 8/9] x86: Kill all traces of irq_remapping_get_irq_domain() David Woodhouse
  2020-10-13  8:11                                         ` [PATCH 9/9] iommu/vt-d: Simplify intel_irq_remapping_select() David Woodhouse
  8 siblings, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-13  8:11 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kernel/apic/io_apic.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index ca2da19d5c55..73cacc92c3bb 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2305,36 +2305,36 @@ static inline void __init check_timer(void)
 
 static int mp_irqdomain_create(int ioapic)
 {
-	struct irq_alloc_info info;
 	struct irq_domain *parent;
 	int hwirqs = mp_ioapic_pin_count(ioapic);
 	struct ioapic *ip = &ioapics[ioapic];
 	struct ioapic_domain_cfg *cfg = &ip->irqdomain_cfg;
 	struct mp_ioapic_gsi *gsi_cfg = mp_ioapic_gsi_routing(ioapic);
 	struct fwnode_handle *fn;
-	char *name = "IO-APIC";
+	struct irq_fwspec fwspec;
 
 	if (cfg->type == IOAPIC_DOMAIN_INVALID)
 		return 0;
 
-	init_irq_alloc_info(&info, NULL);
-	info.type = X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT;
-	info.devid = mpc_ioapic_id(ioapic);
-	parent = irq_remapping_get_irq_domain(&info);
-	if (!parent)
-		parent = x86_vector_domain;
-	else
-		name = "IO-APIC-IR";
-
 	/* Handle device tree enumerated APICs proper */
 	if (cfg->dev) {
 		fn = of_node_to_fwnode(cfg->dev);
 	} else {
-		fn = irq_domain_alloc_named_id_fwnode(name, ioapic);
+		fn = irq_domain_alloc_named_id_fwnode("IO-APIC", ioapic);
 		if (!fn)
 			return -ENOMEM;
 	}
 
+	fwspec.fwnode = fn;
+	fwspec.param_count = 1;
+	fwspec.param[0] = ioapic;
+	parent = irq_find_matching_fwspec(&fwspec, DOMAIN_BUS_ANY);
+	if (!parent) {
+		if (!cfg->dev)
+			irq_domain_free_fwnode(fn);
+		return -ENODEV;
+	}
+
 	ip->irqdomain = irq_domain_create_linear(fn, hwirqs, cfg->ops,
 						 (void *)(long)ioapic);
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH 8/9] x86: Kill all traces of irq_remapping_get_irq_domain()
  2020-10-13  8:11                                       ` [PATCH 0/9] Remove irq_remapping_get_irq_domain() David Woodhouse
                                                           ` (6 preceding siblings ...)
  2020-10-13  8:11                                         ` [PATCH 7/9] x86/ioapic: " David Woodhouse
@ 2020-10-13  8:11                                         ` David Woodhouse
  2020-10-13  8:11                                         ` [PATCH 9/9] iommu/vt-d: Simplify intel_irq_remapping_select() David Woodhouse
  8 siblings, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-13  8:11 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/hw_irq.h        |  2 --
 arch/x86/include/asm/irq_remapping.h |  9 --------
 drivers/iommu/amd/iommu.c            | 34 ----------------------------
 drivers/iommu/hyperv-iommu.c         |  9 --------
 drivers/iommu/intel/irq_remapping.c  | 17 --------------
 drivers/iommu/irq_remapping.c        | 14 ------------
 drivers/iommu/irq_remapping.h        |  3 ---
 7 files changed, 88 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index aabd8f1b6bb0..aef795f17478 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -40,8 +40,6 @@ enum irq_alloc_type {
 	X86_IRQ_ALLOC_TYPE_PCI_MSIX,
 	X86_IRQ_ALLOC_TYPE_DMAR,
 	X86_IRQ_ALLOC_TYPE_UV,
-	X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT,
-	X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT,
 };
 
 struct ioapic_alloc_info {
diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index af4a151d70b3..7cc49432187f 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -44,9 +44,6 @@ extern int irq_remapping_reenable(int);
 extern int irq_remap_enable_fault_handling(void);
 extern void panic_if_irq_remap(const char *msg);
 
-extern struct irq_domain *
-irq_remapping_get_irq_domain(struct irq_alloc_info *info);
-
 /* Create PCI MSI/MSIx irqdomain, use @parent as the parent irqdomain. */
 extern struct irq_domain *
 arch_create_remap_msi_irq_domain(struct irq_domain *par, const char *n, int id);
@@ -71,11 +68,5 @@ static inline void panic_if_irq_remap(const char *msg)
 {
 }
 
-static inline struct irq_domain *
-irq_remapping_get_irq_domain(struct irq_alloc_info *info)
-{
-	return NULL;
-}
-
 #endif /* CONFIG_IRQ_REMAP */
 #endif /* __X86_IRQ_REMAPPING_H */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 7ecebc5d255f..16adbf9d8fbb 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3536,10 +3536,8 @@ static int get_devid(struct irq_alloc_info *info)
 {
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC:
-	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
 		return get_ioapic_devid(info->devid);
 	case X86_IRQ_ALLOC_TYPE_HPET:
-	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
 		return get_hpet_devid(info->devid);
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
@@ -3550,44 +3548,12 @@ static int get_devid(struct irq_alloc_info *info)
 	}
 }
 
-static struct irq_domain *get_irq_domain_for_devid(struct irq_alloc_info *info,
-						   int devid)
-{
-	struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
-
-	if (!iommu)
-		return NULL;
-
-	switch (info->type) {
-	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
-	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
-		return iommu->ir_domain;
-	default:
-		WARN_ON_ONCE(1);
-		return NULL;
-	}
-}
-
-static struct irq_domain *get_irq_domain(struct irq_alloc_info *info)
-{
-	int devid;
-
-	if (!info)
-		return NULL;
-
-	devid = get_devid(info);
-	if (devid < 0)
-		return NULL;
-	return get_irq_domain_for_devid(info, devid);
-}
-
 struct irq_remap_ops amd_iommu_irq_ops = {
 	.prepare		= amd_iommu_prepare,
 	.enable			= amd_iommu_enable,
 	.disable		= amd_iommu_disable,
 	.reenable		= amd_iommu_reenable,
 	.enable_faulting	= amd_iommu_enable_faulting,
-	.get_irq_domain		= get_irq_domain,
 };
 
 static int irq_remapping_select(struct irq_domain *d, struct irq_fwspec *fwspec,
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 6a8966fbc3bd..e7ed2bb83358 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -160,18 +160,9 @@ static int __init hyperv_enable_irq_remapping(void)
 	return IRQ_REMAP_X2APIC_MODE;
 }
 
-static struct irq_domain *hyperv_get_irq_domain(struct irq_alloc_info *info)
-{
-	if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT)
-		return ioapic_ir_domain;
-	else
-		return NULL;
-}
-
 struct irq_remap_ops hyperv_irq_remap_ops = {
 	.prepare		= hyperv_prepare_irq_remapping,
 	.enable			= hyperv_enable_irq_remapping,
-	.get_irq_domain		= hyperv_get_irq_domain,
 };
 
 #endif
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 40c2fec122b8..ccf61cd18f69 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1128,29 +1128,12 @@ static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
 	irte->redir_hint = 1;
 }
 
-static struct irq_domain *intel_get_irq_domain(struct irq_alloc_info *info)
-{
-	if (!info)
-		return NULL;
-
-	switch (info->type) {
-	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
-		return map_ioapic_to_ir(info->devid);
-	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
-		return map_hpet_to_ir(info->devid);
-	default:
-		WARN_ON_ONCE(1);
-		return NULL;
-	}
-}
-
 struct irq_remap_ops intel_irq_remap_ops = {
 	.prepare		= intel_prepare_irq_remapping,
 	.enable			= intel_enable_irq_remapping,
 	.disable		= disable_irq_remapping,
 	.reenable		= reenable_irq_remapping,
 	.enable_faulting	= enable_drhd_fault_handling,
-	.get_irq_domain		= intel_get_irq_domain,
 };
 
 static void intel_ir_reconfigure_irte(struct irq_data *irqd, bool force)
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index 2d84b1ed205e..83314b9d8f38 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -158,17 +158,3 @@ void panic_if_irq_remap(const char *msg)
 	if (irq_remapping_enabled)
 		panic(msg);
 }
-
-/**
- * irq_remapping_get_irq_domain - Get the irqdomain serving the request @info
- * @info: interrupt allocation information, used to identify the IOMMU device
- *
- * Returns pointer to IRQ domain, or NULL on failure.
- */
-struct irq_domain *irq_remapping_get_irq_domain(struct irq_alloc_info *info)
-{
-	if (!remap_ops || !remap_ops->get_irq_domain)
-		return NULL;
-
-	return remap_ops->get_irq_domain(info);
-}
diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
index 1661b3d75920..8c89cb947cdb 100644
--- a/drivers/iommu/irq_remapping.h
+++ b/drivers/iommu/irq_remapping.h
@@ -42,9 +42,6 @@ struct irq_remap_ops {
 
 	/* Enable fault handling */
 	int  (*enable_faulting)(void);
-
-	/* Get the irqdomain associated to IOMMU device */
-	struct irq_domain *(*get_irq_domain)(struct irq_alloc_info *);
 };
 
 extern struct irq_remap_ops intel_irq_remap_ops;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH 9/9] iommu/vt-d: Simplify intel_irq_remapping_select()
  2020-10-13  8:11                                       ` [PATCH 0/9] Remove irq_remapping_get_irq_domain() David Woodhouse
                                                           ` (7 preceding siblings ...)
  2020-10-13  8:11                                         ` [PATCH 8/9] x86: Kill all traces of irq_remapping_get_irq_domain() David Woodhouse
@ 2020-10-13  8:11                                         ` David Woodhouse
  8 siblings, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-13  8:11 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz

From: David Woodhouse <dwmw@amazon.co.uk>

Now that the old get_irq_domain() method has gone, we can consolidate on
just the map_XXX_to_iommu() functions.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 drivers/iommu/intel/irq_remapping.c | 19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index ccf61cd18f69..8a41bcb10e26 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -204,13 +204,13 @@ static int modify_irte(struct irq_2_iommu *irq_iommu,
 	return rc;
 }
 
-static struct irq_domain *map_hpet_to_ir(u8 hpet_id)
+static struct intel_iommu *map_hpet_to_iommu(u8 hpet_id)
 {
 	int i;
 
 	for (i = 0; i < MAX_HPET_TBS; i++) {
 		if (ir_hpet[i].id == hpet_id && ir_hpet[i].iommu)
-			return ir_hpet[i].iommu->ir_domain;
+			return ir_hpet[i].iommu;
 	}
 	return NULL;
 }
@@ -226,13 +226,6 @@ static struct intel_iommu *map_ioapic_to_iommu(int apic)
 	return NULL;
 }
 
-static struct irq_domain *map_ioapic_to_ir(int apic)
-{
-	struct intel_iommu *iommu = map_ioapic_to_iommu(apic);
-
-	return iommu ? iommu->ir_domain : NULL;
-}
-
 static struct irq_domain *map_dev_to_ir(struct pci_dev *dev)
 {
 	struct dmar_drhd_unit *drhd = dmar_find_matched_drhd_unit(dev);
@@ -1422,12 +1415,14 @@ static int intel_irq_remapping_select(struct irq_domain *d,
 				      struct irq_fwspec *fwspec,
 				      enum irq_domain_bus_token bus_token)
 {
+	struct intel_iommu *iommu = NULL;
+
 	if (x86_fwspec_is_ioapic(fwspec))
-		return d == map_ioapic_to_ir(fwspec->param[0]);
+		iommu = map_ioapic_to_iommu(fwspec->param[0]);
 	else if (x86_fwspec_is_hpet(fwspec))
-		return d == map_hpet_to_ir(fwspec->param[0]);
+		iommu = map_hpet_to_iommu(fwspec->param[0]);
 
-	return 0;
+	return iommu && d == iommu->ir_domain;
 }
 
 static const struct irq_domain_ops intel_ir_domain_ops = {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-13  7:52                                     ` David Woodhouse
  2020-10-13  8:11                                       ` [PATCH 0/9] Remove irq_remapping_get_irq_domain() David Woodhouse
@ 2020-10-13  9:28                                       ` Thomas Gleixner
  2020-10-13 10:15                                         ` David Woodhouse
  2020-10-13 10:46                                         ` Thomas Gleixner
  1 sibling, 2 replies; 192+ messages in thread
From: Thomas Gleixner @ 2020-10-13  9:28 UTC (permalink / raw)
  To: David Woodhouse, x86, Marc Zyngier; +Cc: kvm, Paolo Bonzini, linux-kernel

On Tue, Oct 13 2020 at 08:52, David Woodhouse wrote:
> On Tue, 2020-10-13 at 00:13 +0200, Thomas Gleixner wrote:
> +       dom = irq_find_matching_fwspec(fwspec, DOMAIN_BUS_IR);
> +       if (dom)
> +               return IS_ERR(dom) ? NULL : dom;
> +
> +       return x86_vector_domain;
> +}
>
> Ick. There's no need for that.
>
> Eliminating that awful "if not found then slip the x86_vector_domain in
> as a special case" was the whole *point* of using
> irq_find_matching_fwspec() in the first place.

The point was to get rid of irq_remapping_get_irq_domain().

And TBH,

        if (apicid_valid(32768))

is just another way to slip the vector domain in. It's just differently
awful.

Having an explicit answer from the search for IR:

    - Here is the domain
    - Your device is not registered properly
    - IR not enabled or not supported

is way more obvious than the above disguised is_remapping_enabled()
check.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-13  9:28                                       ` [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID Thomas Gleixner
@ 2020-10-13 10:15                                         ` David Woodhouse
  2020-10-13 10:46                                         ` Thomas Gleixner
  1 sibling, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-13 10:15 UTC (permalink / raw)
  To: Thomas Gleixner, x86, Marc Zyngier; +Cc: kvm, Paolo Bonzini, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2575 bytes --]

On Tue, 2020-10-13 at 11:28 +0200, Thomas Gleixner wrote:
> On Tue, Oct 13 2020 at 08:52, David Woodhouse wrote:
> > On Tue, 2020-10-13 at 00:13 +0200, Thomas Gleixner wrote:
> > +       dom = irq_find_matching_fwspec(fwspec, DOMAIN_BUS_IR);
> > +       if (dom)
> > +               return IS_ERR(dom) ? NULL : dom;
> > +
> > +       return x86_vector_domain;
> > +}
> > 
> > Ick. There's no need for that.
> > 
> > Eliminating that awful "if not found then slip the x86_vector_domain in
> > as a special case" was the whole *point* of using
> > irq_find_matching_fwspec() in the first place.
> 
> The point was to get rid of irq_remapping_get_irq_domain().

My reason for doing it was to get rid of irq_remapping_get_irq_domain()
*because* I hated the special-casing and magical slipping in of
x86_vector_domain when it returned NULL.

> And TBH,
> 
>         if (apicid_valid(32768))
> 
> is just another way to slip the vector domain in. It's just differently
> awful.

For me, that's very much not just "slipping the vector domain in".
That's the vector domain returning true in its *own* ->select()
function, in the circumstances where it wants to be used.

The key difference is that nobody needs an external magic pointer to
the x86_vector_domain. In a true irqdomain hierarchy system, shouldn't
we be trying to eliminate *all* those magic pointers to specific
domains, if we can?

And sure, the apicid_valid(32768) as a proxy for irq_remapping_enabled
is a bit of an ugly trick. I suppose we can explicitly expose
irq_remapping_enabled from drivers/iommu if we wanted to.

> Having an explicit answer from the search for IR:
> 
>     - Here is the domain
>     - Your device is not registered properly
>     - IR not enabled or not supported
> 
> is way more obvious than the above disguised is_remapping_enabled()
> check.

I just don't even like thinking of it as a 'search for IR'.

HPET shouldn't be caring about IR any more than PCI devices do. It just
wants its parent irqdomain, that's all.

For I/OAPIC there's the slight complexity that it does actually ack
level-triggered interrupts differently when it's behind IR. But I don't
think we need a whole separate irq_chip for that; surely it could be
handled internally in ioapic_ack_level() ? 

Either way, even with that slight hack it's nicer to think of
mp_irqdomain_create() just wanting to find its parent domain, without
any special knowledge of IR and falling back to x86_parent_domain. The
hack for IR level-ack is then self-contained.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-13  9:28                                       ` [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID Thomas Gleixner
  2020-10-13 10:15                                         ` David Woodhouse
@ 2020-10-13 10:46                                         ` Thomas Gleixner
  2020-10-13 10:53                                           ` David Woodhouse
  1 sibling, 1 reply; 192+ messages in thread
From: Thomas Gleixner @ 2020-10-13 10:46 UTC (permalink / raw)
  To: David Woodhouse, x86, Marc Zyngier; +Cc: kvm, Paolo Bonzini, linux-kernel

On Tue, Oct 13 2020 at 11:28, Thomas Gleixner wrote:
> On Tue, Oct 13 2020 at 08:52, David Woodhouse wrote:
>> On Tue, 2020-10-13 at 00:13 +0200, Thomas Gleixner wrote:
>> +       dom = irq_find_matching_fwspec(fwspec, DOMAIN_BUS_IR);
>> +       if (dom)
>> +               return IS_ERR(dom) ? NULL : dom;
>> +
>> +       return x86_vector_domain;
>> +}
>>
>> Ick. There's no need for that.
>>
>> Eliminating that awful "if not found then slip the x86_vector_domain in
>> as a special case" was the whole *point* of using
>> irq_find_matching_fwspec() in the first place.
>
> The point was to get rid of irq_remapping_get_irq_domain().
>
> And TBH,
>
>         if (apicid_valid(32768))
>
> is just another way to slip the vector domain in. It's just differently
> awful.
>
> Having an explicit answer from the search for IR:
>
>     - Here is the domain
>     - Your device is not registered properly
>     - IR not enabled or not supported
>
> is way more obvious than the above disguised is_remapping_enabled()
> check.

And after becoming more awake, that wont work anyway because there is
more than one IR domain, so there is no way to return an error "You
forgot to register" obviously.

But the APIC id (32768) valid check is also broken because IR can be
enabled even without X2APIC.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-13 10:46                                         ` Thomas Gleixner
@ 2020-10-13 10:53                                           ` David Woodhouse
  2020-10-13 11:51                                             ` David Woodhouse
  0 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-13 10:53 UTC (permalink / raw)
  To: Thomas Gleixner, x86, Marc Zyngier; +Cc: kvm, Paolo Bonzini, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 773 bytes --]

On Tue, 2020-10-13 at 12:46 +0200, Thomas Gleixner wrote:
> And after becoming more awake, that wont work anyway because there is
> more than one IR domain, so there is no way to return an error "You
> forgot to register" obviously.
> 
> But the APIC id (32768) valid check is also broken because IR can be
> enabled even without X2APIC.

Nope, it's perfectly OK to allow HPET and I/OAPIC to be parented in the
x86_vector_domain in that case, regardless of IR.

The *actual* criterion for x86_vector_select() returning zero to say 
"don't use me" is "could there be CPUs in the system which can't be
reached through x86_vector_msi_compose_msg()?". It's not really about
IR at all.

The apic_id_valid(32768) check is checking precisely the right thing.



[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-13 10:53                                           ` David Woodhouse
@ 2020-10-13 11:51                                             ` David Woodhouse
  2020-10-13 12:40                                               ` Thomas Gleixner
  0 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-13 11:51 UTC (permalink / raw)
  To: Thomas Gleixner, x86, Marc Zyngier; +Cc: kvm, Paolo Bonzini, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2458 bytes --]

On Tue, 2020-10-13 at 11:53 +0100, David Woodhouse wrote:
> On Tue, 2020-10-13 at 12:46 +0200, Thomas Gleixner wrote:
> > And after becoming more awake, that wont work anyway because there is
> > more than one IR domain, so there is no way to return an error "You
> > forgot to register" obviously.
> > 
> > But the APIC id (32768) valid check is also broken because IR can be
> > enabled even without X2APIC.
> 
> Nope, it's perfectly OK to allow HPET and I/OAPIC to be parented in the
> x86_vector_domain in that case, regardless of IR.
> 
> The *actual* criterion for x86_vector_select() returning zero to say 
> "don't use me" is "could there be CPUs in the system which can't be
> reached through x86_vector_msi_compose_msg()?". It's not really about
> IR at all.
> 
> The apic_id_valid(32768) check is checking precisely the right thing.

With that realisation, I've fixed the comment in my ext_dest_id branch
to remove all mention of IRQ remapping. It now looks like this:

static int x86_vector_select(struct irq_domain *d, struct irq_fwspec *fwspec,
			     enum irq_domain_bus_token bus_token)
{
	/*
	 * HPET and I/OAPIC drivers use irq_find_matching_irqdomain()
	 * to find their parent irqdomain. For x86_vector_domain to be
	 * suitable, all CPUs in the system must be reachable by its
	 * x86_vector_msi_compose_msg() function. Which is only true
	 * in !x2apic mode, or in x2apic physical mode if APIC IDs were
	 * restricted to 8 or 15 bits at boot time. In those cases,
	 * 1<<15 will *not* be a valid APIC ID.
	 */
	if (apic->apic_id_valid(1<<15))
		return 0;

	return x86_fwspec_is_ioapic(fwspec) || x86_fwspec_is_hpet(fwspec);
}

That makes it clearer that this isn't just some incestuous interaction
with IRQ remapping — that APIC ID limit really is the basis on which
this irqdomain, all by itself, makes the decision about whether it's
capable of being the parent irqdomain to the requesting device.

It just so happens that *if* you allow higher APIC IDs in the system
and *if* no other irqdomains exist which take ownership of a given HPET
or I/OAPIC, then that child device will fail to find a suitable parent
irqdomain and will fail to initialise. And that's just how we want it.

Sure, we have that special case in x2apic startup where we ensure that
if we don't have IR then we limit the APIC IDs. But that's there, and
*this* code is perfectly self-contained and isolated.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-13 11:51                                             ` David Woodhouse
@ 2020-10-13 12:40                                               ` Thomas Gleixner
  0 siblings, 0 replies; 192+ messages in thread
From: Thomas Gleixner @ 2020-10-13 12:40 UTC (permalink / raw)
  To: David Woodhouse, x86, Marc Zyngier; +Cc: kvm, Paolo Bonzini, linux-kernel

On Tue, Oct 13 2020 at 12:51, David Woodhouse wrote:
> With that realisation, I've fixed the comment in my ext_dest_id branch
> to remove all mention of IRQ remapping. It now looks like this:
>
> static int x86_vector_select(struct irq_domain *d, struct irq_fwspec *fwspec,
> 			     enum irq_domain_bus_token bus_token)
> {
> 	/*
> 	 * HPET and I/OAPIC drivers use irq_find_matching_irqdomain()
> 	 * to find their parent irqdomain. For x86_vector_domain to be
> 	 * suitable, all CPUs in the system must be reachable by its
> 	 * x86_vector_msi_compose_msg() function. Which is only true
> 	 * in !x2apic mode, or in x2apic physical mode if APIC IDs were
> 	 * restricted to 8 or 15 bits at boot time. In those cases,
> 	 * 1<<15 will *not* be a valid APIC ID.
> 	 */
> 	if (apic->apic_id_valid(1<<15))
> 		return 0;
>
> 	return x86_fwspec_is_ioapic(fwspec) || x86_fwspec_is_hpet(fwspec);
> }
>
> That makes it clearer that this isn't just some incestuous interaction
> with IRQ remapping — that APIC ID limit really is the basis on which
> this irqdomain, all by itself, makes the decision about whether it's
> capable of being the parent irqdomain to the requesting device.

Yes, that makes sense now.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v2 8/8] x86/ioapic: Generate RTE directly from parent irqchip's MSI message
  2020-10-09 10:46   ` [PATCH v2 8/8] x86/ioapic: Generate RTE directly from parent irqchip's MSI message David Woodhouse
@ 2020-10-22 21:43     ` Thomas Gleixner
  2020-10-22 22:10       ` Thomas Gleixner
  2020-10-23 10:10       ` David Woodhouse
  0 siblings, 2 replies; 192+ messages in thread
From: Thomas Gleixner @ 2020-10-22 21:43 UTC (permalink / raw)
  To: David Woodhouse, x86; +Cc: kvm, Paolo Bonzini, linux-kernel, linux-hyperv

On Fri, Oct 09 2020 at 11:46, David Woodhouse wrote:

@@ -45,12 +45,11 @@ enum irq_alloc_type {
 };

> +static void mp_swizzle_msi_dest_bits(struct irq_data *irq_data, void *_entry)
> +{
> +	struct msi_msg msg;
> +	u32 *entry = _entry;

Why is this a void * argument and then converting it to a u32 *? Just to
make that function completely unreadable?

> +
> +	irq_chip_compose_msi_msg(irq_data, &msg);

Lacks a comment. Also mp_swizzle... is a misnomer as this invokes the
msi compose function which is not what the function name suggests.

> +	/*
> +	 * They're in a bit of a random order for historical reasons, but
> +	 * the IO/APIC is just a device for turning interrupt lines into
> +	 * MSIs, and various bits of the MSI addr/data are just swizzled
> +	 * into/from the bits of Redirection Table Entry.
> +	 */
> +	entry[0] &= 0xfffff000;
> +	entry[0] |= (msg.data & (MSI_DATA_DELIVERY_MODE_MASK |
> +				 MSI_DATA_VECTOR_MASK));
> +	entry[0] |= (msg.address_lo & MSI_ADDR_DEST_MODE_MASK) << 9;
> +
> +	entry[1] &= 0xffff;
> +	entry[1] |= (msg.address_lo & MSI_ADDR_DEST_ID_MASK) << 12;

Sorry, but this is unreviewable gunk. The whole msi_msg setup sucks with
this unholy macro maze. I have a half finished series which allows
architectures to provide shadow members for data, address_* so this can
be done proper with bitfields.

Aside of that it works magically because polarity,trigger and mask bit
have been set up before. But of course a comment about this is
completely overrated.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v2 8/8] x86/ioapic: Generate RTE directly from parent irqchip's MSI message
  2020-10-22 21:43     ` Thomas Gleixner
@ 2020-10-22 22:10       ` Thomas Gleixner
  2020-10-23 17:04         ` David Woodhouse
  2020-10-23 10:10       ` David Woodhouse
  1 sibling, 1 reply; 192+ messages in thread
From: Thomas Gleixner @ 2020-10-22 22:10 UTC (permalink / raw)
  To: David Woodhouse, x86; +Cc: kvm, Paolo Bonzini, linux-kernel, linux-hyperv

On Thu, Oct 22 2020 at 23:43, Thomas Gleixner wrote:
> On Fri, Oct 09 2020 at 11:46, David Woodhouse wrote:
> Aside of that it works magically because polarity,trigger and mask bit
> have been set up before. But of course a comment about this is
> completely overrated.

Also this part:

> -static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
> -			   struct IO_APIC_route_entry *entry)
> +static void mp_setup_entry(struct irq_data *irq_data, struct mp_chip_data *data)
>  {
> +	struct IO_APIC_route_entry *entry = &data->entry;
> +
>  	memset(entry, 0, sizeof(*entry));
> -	entry->delivery_mode = apic->irq_delivery_mode;
> -	entry->dest_mode     = apic->irq_dest_mode;
> -	entry->dest	     = cfg->dest_apicid & 0xff;
> -	entry->virt_ext_dest = cfg->dest_apicid >> 8;
> -	entry->vector	     = cfg->vector;
> +
> +	mp_swizzle_msi_dest_bits(irq_data, entry);
> +
>  	entry->trigger	     = data->trigger;
>  	entry->polarity	     = data->polarity;
>  	/*

does not make sense. It did not make sense before either, but now it
does even make less sense.

During allocation this only needs to setup the I/O-APIC specific bits
(trigger, polarity, mask). The rest is filled in when the actual
activation happens. Nothing writes that entry _before_ activation.

/me goes to mop up more


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v2 8/8] x86/ioapic: Generate RTE directly from parent irqchip's MSI message
  2020-10-22 21:43     ` Thomas Gleixner
  2020-10-22 22:10       ` Thomas Gleixner
@ 2020-10-23 10:10       ` David Woodhouse
  2020-10-23 21:28         ` Thomas Gleixner
  1 sibling, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-23 10:10 UTC (permalink / raw)
  To: Thomas Gleixner, x86; +Cc: kvm, Paolo Bonzini, linux-kernel, linux-hyperv



On 22 October 2020 22:43:52 BST, Thomas Gleixner <tglx@linutronix.de> wrote:
>On Fri, Oct 09 2020 at 11:46, David Woodhouse wrote:
>
>@@ -45,12 +45,11 @@ enum irq_alloc_type {
> };
>
>> +static void mp_swizzle_msi_dest_bits(struct irq_data *irq_data, void
>*_entry)
>> +{
>> +	struct msi_msg msg;
>> +	u32 *entry = _entry;
>
>Why is this a void * argument and then converting it to a u32 *? Just
>to
>make that function completely unreadable?

It makes the callers slightly more readable, not having to cast to uint32_t* from the struct.

I did ponder defining a new struct with bitfields named along the lines of 'msi_addr_bits_19_to_4', but that seemed like overkill.

>> +
>> +	irq_chip_compose_msi_msg(irq_data, &msg);
>
>Lacks a comment. Also mp_swizzle... is a misnomer as this invokes the
>msi compose function which is not what the function name suggests.

It's got a four-line comment right there after it as we use the bits we got from it.

>> +	/*
>> +	 * They're in a bit of a random order for historical reasons, but
>> +	 * the IO/APIC is just a device for turning interrupt lines into
>> +	 * MSIs, and various bits of the MSI addr/data are just swizzled
>> +	 * into/from the bits of Redirection Table Entry.
>> +	 */
>> +	entry[0] &= 0xfffff000;
>> +	entry[0] |= (msg.data & (MSI_DATA_DELIVERY_MODE_MASK |
>> +				 MSI_DATA_VECTOR_MASK));
>> +	entry[0] |= (msg.address_lo & MSI_ADDR_DEST_MODE_MASK) << 9;
>> +
>> +	entry[1] &= 0xffff;
>> +	entry[1] |= (msg.address_lo & MSI_ADDR_DEST_ID_MASK) << 12;
>
>Sorry, but this is unreviewable gunk.

Crap. Sure, want to look at the I/OAPIC and MSI documentation and observe that it's just shifting bits around and "these bits go there, those bits go here..." but there's no magic which will make that go away. At some point you end up swizzling bits around with seemingly random bitmasks and shifts which you have to have worked out from the docs.

Now that I killed off the various IOMMU bits which also horrifically touch the RTE directly, perhaps there is more scope for faffing around with it differently, but it won't really change much. It's just urinating into the atmospheric disturbance.


>Aside of that it works magically because polarity,trigger and mask bit
>have been set up before. But of course a comment about this is
>completely overrated.

Well yes, there's a lot about the I/OAPIC code which is a bit horrid but this patch is doing just one thing: making the bits get from e.g. cfg->dest_apicid and cfg->vector into the RTE in a generic fashion which does the right thing for IR too. Other cleanups are somewhat orthogonal, but yes it's a good spot that one of these was somewhat redundant in the first place. We could fix that up in a separate patch which comes first perhaps.

If we're starting to clean up I/OAPIC code I'd like to take a long hard look at the level ack code paths, and the fact that we have separate ack_apic_irq and apic_ack_irq functions (or something like that; on phone now) which do different things. Perhaps the order (or the capitals on APIC which one of them has) makes it sane and meaningful for them both to exist and do different things?

I also note the Hyper-V "remapping" takes the IR code path and I'm not sure that's OK. Because of the complete lack of overall documentation on what it's all doing and why.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v2 8/8] x86/ioapic: Generate RTE directly from parent irqchip's MSI message
  2020-10-22 22:10       ` Thomas Gleixner
@ 2020-10-23 17:04         ` David Woodhouse
  0 siblings, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-23 17:04 UTC (permalink / raw)
  To: Thomas Gleixner, x86; +Cc: kvm, Paolo Bonzini, linux-kernel, linux-hyperv

[-- Attachment #1: Type: text/plain, Size: 15343 bytes --]

On Fri, 2020-10-23 at 00:10 +0200, Thomas Gleixner wrote:
> > -static void mp_setup_entry(struct irq_cfg *cfg, struct
> > mp_chip_data *data,
> > -                        struct IO_APIC_route_entry *entry)
> > +static void mp_setup_entry(struct irq_data *irq_data, struct
> > mp_chip_data *data)
> >   {
> > +     struct IO_APIC_route_entry *entry = &data->entry;
> > +
> >        memset(entry, 0, sizeof(*entry));
> > -     entry->delivery_mode = apic->irq_delivery_mode;
> > -     entry->dest_mode     = apic->irq_dest_mode;
> > -     entry->dest          = cfg->dest_apicid & 0xff;
> > -     entry->virt_ext_dest = cfg->dest_apicid >> 8;
> > -     entry->vector        = cfg->vector;
> > +
> > +     mp_swizzle_msi_dest_bits(irq_data, entry);
> > +
> >        entry->trigger       = data->trigger;
> >        entry->polarity      = data->polarity;
> >        /*
> 
> does not make sense. It did not make sense before either, but now it
> does even make less sense.
> 
> During allocation this only needs to setup the I/O-APIC specific bits
> (trigger, polarity, mask). The rest is filled in when the actual
> activation happens. Nothing writes that entry _before_ activation.
> 
> /me goes to mop up more

Yeah... that code was indeed a pile of crap before I looked at it,
wasn't it? And I indeed failed to spot it and mop it up as I touched
it.

Here's the version I've just pushed to my tree, which I'll test
properly both with and without IR over the weekend before posting v3.

There is no way that bit swizzling is every going to be anything short
of fugly. That's just the reality of the hardware. Even doing it with
bitfields is just going to be masking the issue by having structure
definitions that don't actually match the I/OAPIC documentation. Better
to be up front about it. I've added more words though; more words
always help...

https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/ext_dest_id

From f912b52996d381fd8a631dd10c713772c2ade478 Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw@amazon.co.uk>
Date: Thu, 8 Oct 2020 15:44:42 +0100
Subject: [PATCH 10/19] x86/ioapic: Generate RTE directly from parent irqchip's
 MSI message

The I/OAPIC generates an MSI cycle with address/data bits taken from its
Redirection Table Entry in some combination which used to make sense,
but now is just a bunch of bits which get passed through in some
seemingly arbitrary order.

Instead of making IRQ remapping drivers directly frob the I/OAPIC RTE,
let them just do their job and generate an MSI message. The bit
swizzling to turn that MSI message into the IOAPIC's RTE is the same in
all cases, since it's a function of the I/OAPIC hardware. The IRQ
remappers have no real need to get involved with that.

The only slight caveat is that the I/OAPIC is interpreting some of
those fields too, and it does want the 'vector' field to be unique
to make EOI work. The AMD IOMMU happens to put its IRTE index in the
bits that the I/OAPIC thinks are the vector field, and accommodates
this requirement by reserving the first 32 indices for the I/OAPIC.
The Intel IOMMU doesn't actually use the bits that the I/OAPIC thinks
are the vector field, so it fills in the 'pin' value there instead.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/hw_irq.h       | 11 ++--
 arch/x86/include/asm/msidef.h       |  2 +
 arch/x86/kernel/apic/io_apic.c      | 81 ++++++++++++++++++++++-------
 drivers/iommu/amd/iommu.c           | 14 -----
 drivers/iommu/hyperv-iommu.c        | 31 -----------
 drivers/iommu/intel/irq_remapping.c | 19 ++-----
 6 files changed, 74 insertions(+), 84 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index a4aeeaace040..aabd8f1b6bb0 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -45,12 +45,11 @@ enum irq_alloc_type {
 };
 
 struct ioapic_alloc_info {
-	int				pin;
-	int				node;
-	u32				trigger : 1;
-	u32				polarity : 1;
-	u32				valid : 1;
-	struct IO_APIC_route_entry	*entry;
+	int		pin;
+	int		node;
+	u32		trigger : 1;
+	u32		polarity : 1;
+	u32		valid : 1;
 };
 
 struct uv_alloc_info {
diff --git a/arch/x86/include/asm/msidef.h b/arch/x86/include/asm/msidef.h
index ee2f8ccc32d0..37c3d2d492c9 100644
--- a/arch/x86/include/asm/msidef.h
+++ b/arch/x86/include/asm/msidef.h
@@ -18,6 +18,7 @@
 #define MSI_DATA_DELIVERY_MODE_SHIFT	8
 #define  MSI_DATA_DELIVERY_FIXED	(0 << MSI_DATA_DELIVERY_MODE_SHIFT)
 #define  MSI_DATA_DELIVERY_LOWPRI	(1 << MSI_DATA_DELIVERY_MODE_SHIFT)
+#define  MSI_DATA_DELIVERY_MODE_MASK	(3 << MSI_DATA_DELIVERY_MODE_SHIFT)
 
 #define MSI_DATA_LEVEL_SHIFT		14
 #define	 MSI_DATA_LEVEL_DEASSERT	(0 << MSI_DATA_LEVEL_SHIFT)
@@ -37,6 +38,7 @@
 #define MSI_ADDR_DEST_MODE_SHIFT	2
 #define  MSI_ADDR_DEST_MODE_PHYSICAL	(0 << MSI_ADDR_DEST_MODE_SHIFT)
 #define	 MSI_ADDR_DEST_MODE_LOGICAL	(1 << MSI_ADDR_DEST_MODE_SHIFT)
+#define  MSI_ADDR_DEST_MODE_MASK	(1 << MSI_DATA_DELIVERY_MODE_SHIFT)
 
 #define MSI_ADDR_REDIRECTION_SHIFT	3
 #define  MSI_ADDR_REDIRECTION_CPU	(0 << MSI_ADDR_REDIRECTION_SHIFT)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 54f6a029b1d1..b9e6236af833 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -48,6 +48,7 @@
 #include <linux/jiffies.h>	/* time_after() */
 #include <linux/slab.h>
 #include <linux/memblock.h>
+#include <linux/msi.h>
 
 #include <asm/irqdomain.h>
 #include <asm/io.h>
@@ -63,6 +64,7 @@
 #include <asm/setup.h>
 #include <asm/irq_remapping.h>
 #include <asm/hw_irq.h>
+#include <asm/msidef.h>
 
 #include <asm/apic.h>
 
@@ -1851,22 +1853,64 @@ static void ioapic_ir_ack_level(struct irq_data *irq_data)
 	eoi_ioapic_pin(data->entry.vector, data);
 }
 
+static void mp_swizzle_msi_dest_bits(struct irq_data *irq_data,
+				     struct IO_APIC_route_entry *rte)
+{
+	struct msi_msg msg;
+	u32 *entry = (u32 *)rte;
+
+	/*
+	 * They're in a bit of a random order for historical reasons, but
+	 * the I/OAPIC is just a device for turning interrupt lines into
+	 * MSIs, and various bits of the MSI addr/data are just swizzled
+	 * into/from the bits of Redirection Table Entry. So let the
+	 * upstream irqdomain (be it interrupt remapping or otherwise)
+	 * compose the MSI message, and we'll shift the bits into the
+	 * appropriate place in the RTE.
+	 */
+	irq_chip_compose_msi_msg(irq_data, &msg);
+
+	/*
+	 * The low 12 bits of the RTE were historically the vector,
+	 * delivery_mode and destination mode. Which come from the
+	 * low 8 bits of the MSI data, the *next* 3 bits of the MSI
+	 * data, and bit 2 of the MSI address (which thus has to be
+	 * shifted up by 9 to land in the right place in bit 11 of
+	 * the RTE).
+	 *
+	 * With Interrupt Remapping of course many bits in the MSI
+	 * have different meanings but the bit-swizzling of the
+	 * I/OAPIC hardware remains the same.
+	 */
+	entry[0] &= 0xfffff000;
+	entry[0] |= (msg.data & (MSI_DATA_DELIVERY_MODE_MASK |
+				 MSI_DATA_VECTOR_MASK));
+	entry[0] |= (msg.address_lo & MSI_ADDR_DEST_MODE_MASK) << 9;
+
+	/*
+	 * Top 16 bits of the RTE are the destination and extended
+	 * destination ID fields, which come from bits 19-4 of the
+	 * MSI address.
+	 */
+	entry[1] &= 0xffff;
+	entry[1] |= (msg.address_lo & MSI_ADDR_DEST_ID_MASK) << 12;
+}
+
+
 static void ioapic_configure_entry(struct irq_data *irqd)
 {
 	struct mp_chip_data *mpd = irqd->chip_data;
-	struct irq_cfg *cfg = irqd_cfg(irqd);
 	struct irq_pin_list *entry;
 
 	/*
-	 * Only update when the parent is the vector domain, don't touch it
-	 * if the parent is the remapping domain. Check the installed
-	 * ioapic chip to verify that.
+	 * The polarity, trigger and mask bits have already been
+	 * set up at allocation time by mp_setup_entry(). What
+	 * remains on activation and set_affinity is to set up
+	 * the various destination bits which are obtained from
+	 * the upstream irq domain's generated MSI message.
 	 */
-	if (irqd->chip == &ioapic_chip) {
-		mpd->entry.dest = cfg->dest_apicid & 0xff;
-		mpd->entry.virt_ext_dest = cfg->dest_apicid >> 8;
-		mpd->entry.vector = cfg->vector;
-	}
+	mp_swizzle_msi_dest_bits(irqd, &mpd->entry);
+
 	for_each_irq_pin(entry, mpd->irq_2_pin)
 		__ioapic_write_entry(entry->apic, entry->pin, mpd->entry);
 }
@@ -2949,15 +2993,16 @@ static void mp_irqdomain_get_attr(u32 gsi, struct mp_chip_data *data,
 	}
 }
 
-static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
-			   struct IO_APIC_route_entry *entry)
+static void mp_setup_entry(struct irq_data *irq_data, struct mp_chip_data *data)
 {
+	struct IO_APIC_route_entry *entry = &data->entry;
+
+	/*
+	 * The destination bits get set up by ioapic_configure_entry()
+	 * when the IRQ is activated. For now just set up the I/OAPIC
+	 * specific fields.
+	 */
 	memset(entry, 0, sizeof(*entry));
-	entry->delivery_mode = apic->irq_delivery_mode;
-	entry->dest_mode     = apic->irq_dest_mode;
-	entry->dest	     = cfg->dest_apicid & 0xff;
-	entry->virt_ext_dest = cfg->dest_apicid >> 8;
-	entry->vector	     = cfg->vector;
 	entry->trigger	     = data->trigger;
 	entry->polarity	     = data->polarity;
 	/*
@@ -2995,7 +3040,6 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
 	if (!data)
 		return -ENOMEM;
 
-	info->ioapic.entry = &data->entry;
 	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, info);
 	if (ret < 0) {
 		kfree(data);
@@ -3013,8 +3057,7 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
 	add_pin_to_irq_node(data, ioapic_alloc_attr_node(info), ioapic, pin);
 
 	local_irq_save(flags);
-	if (info->ioapic.entry)
-		mp_setup_entry(cfg, data, info->ioapic.entry);
+	mp_setup_entry(irq_data, data);
 	mp_register_handler(virq, data->trigger);
 	if (virq < nr_legacy_irqs())
 		legacy_pic->mask(virq);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index ef64e01f66d7..13d0a8f42d56 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3597,7 +3597,6 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 {
 	struct irq_2_irte *irte_info = &data->irq_2_irte;
 	struct msi_msg *msg = &data->msi_entry;
-	struct IO_APIC_route_entry *entry;
 	struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
 
 	if (!iommu)
@@ -3611,19 +3610,6 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC:
-		/* Setup IOAPIC entry */
-		entry = info->ioapic.entry;
-		info->ioapic.entry = NULL;
-		memset(entry, 0, sizeof(*entry));
-		entry->vector        = index;
-		entry->mask          = 0;
-		entry->trigger       = info->ioapic.trigger;
-		entry->polarity      = info->ioapic.polarity;
-		/* Mask level triggered irqs. */
-		if (info->ioapic.trigger)
-			entry->mask = 1;
-		break;
-
 	case X86_IRQ_ALLOC_TYPE_HPET:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 12ec31534995..3a674262cc91 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -40,7 +40,6 @@ static int hyperv_ir_set_affinity(struct irq_data *data,
 {
 	struct irq_data *parent = data->parent_data;
 	struct irq_cfg *cfg = irqd_cfg(data);
-	struct IO_APIC_route_entry *entry;
 	int ret;
 
 	/* Return error If new irq affinity is out of ioapic_max_cpumask. */
@@ -51,9 +50,6 @@ static int hyperv_ir_set_affinity(struct irq_data *data,
 	if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
 		return ret;
 
-	entry = data->chip_data;
-	entry->dest = cfg->dest_apicid;
-	entry->vector = cfg->vector;
 	send_cleanup_vector(cfg);
 
 	return 0;
@@ -89,20 +85,6 @@ static int hyperv_irq_remapping_alloc(struct irq_domain *domain,
 
 	irq_data->chip = &hyperv_ir_chip;
 
-	/*
-	 * If there is interrupt remapping function of IOMMU, setting irq
-	 * affinity only needs to change IRTE of IOMMU. But Hyper-V doesn't
-	 * support interrupt remapping function, setting irq affinity of IO-APIC
-	 * interrupts still needs to change IO-APIC registers. But ioapic_
-	 * configure_entry() will ignore value of cfg->vector and cfg->
-	 * dest_apicid when IO-APIC's parent irq domain is not the vector
-	 * domain.(See ioapic_configure_entry()) In order to setting vector
-	 * and dest_apicid to IO-APIC register, IO-APIC entry pointer is saved
-	 * in the chip_data and hyperv_irq_remapping_activate()/hyperv_ir_set_
-	 * affinity() set vector and dest_apicid directly into IO-APIC entry.
-	 */
-	irq_data->chip_data = info->ioapic.entry;
-
 	/*
 	 * Hypver-V IO APIC irq affinity should be in the scope of
 	 * ioapic_max_cpumask because no irq remapping support.
@@ -119,22 +101,9 @@ static void hyperv_irq_remapping_free(struct irq_domain *domain,
 	irq_domain_free_irqs_common(domain, virq, nr_irqs);
 }
 
-static int hyperv_irq_remapping_activate(struct irq_domain *domain,
-			  struct irq_data *irq_data, bool reserve)
-{
-	struct irq_cfg *cfg = irqd_cfg(irq_data);
-	struct IO_APIC_route_entry *entry = irq_data->chip_data;
-
-	entry->dest = cfg->dest_apicid;
-	entry->vector = cfg->vector;
-
-	return 0;
-}
-
 static const struct irq_domain_ops hyperv_ir_domain_ops = {
 	.alloc = hyperv_irq_remapping_alloc,
 	.free = hyperv_irq_remapping_free,
-	.activate = hyperv_irq_remapping_activate,
 };
 
 static int __init hyperv_prepare_irq_remapping(void)
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 0cfce1d3b7bb..511dfb4884bc 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1265,7 +1265,6 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 					     struct irq_alloc_info *info,
 					     int index, int sub_handle)
 {
-	struct IR_IO_APIC_route_entry *entry;
 	struct irte *irte = &data->irte_entry;
 	struct msi_msg *msg = &data->msi_entry;
 
@@ -1281,23 +1280,15 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 			irte->avail, irte->vector, irte->dest_id,
 			irte->sid, irte->sq, irte->svt);
 
-		entry = (struct IR_IO_APIC_route_entry *)info->ioapic.entry;
-		info->ioapic.entry = NULL;
-		memset(entry, 0, sizeof(*entry));
-		entry->index2	= (index >> 15) & 0x1;
-		entry->zero	= 0;
-		entry->format	= 1;
-		entry->index	= (index & 0x7fff);
 		/*
 		 * IO-APIC RTE will be configured with virtual vector.
 		 * irq handler will do the explicit EOI to the io-apic.
 		 */
-		entry->vector	= info->ioapic.pin;
-		entry->mask	= 0;			/* enable IRQ */
-		entry->trigger	= info->ioapic.trigger;
-		entry->polarity	= info->ioapic.polarity;
-		if (info->ioapic.trigger)
-			entry->mask = 1; /* Mask level triggered irqs. */
+		msg->data = info->ioapic.pin;
+		msg->address_hi = MSI_ADDR_BASE_HI;
+		msg->address_lo = MSI_ADDR_BASE_LO | MSI_ADDR_IR_EXT_INT |
+				  MSI_ADDR_IR_INDEX1(index) |
+				  MSI_ADDR_IR_INDEX2(index);
 		break;
 
 	case X86_IRQ_ALLOC_TYPE_HPET:
-- 
2.17.1


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* Re: [PATCH v2 8/8] x86/ioapic: Generate RTE directly from parent irqchip's MSI message
  2020-10-23 10:10       ` David Woodhouse
@ 2020-10-23 21:28         ` Thomas Gleixner
  2020-10-24  8:26           ` David Woodhouse
  0 siblings, 1 reply; 192+ messages in thread
From: Thomas Gleixner @ 2020-10-23 21:28 UTC (permalink / raw)
  To: David Woodhouse, x86; +Cc: kvm, Paolo Bonzini, linux-kernel, linux-hyperv

On Fri, Oct 23 2020 at 11:10, David Woodhouse wrote:
> On 22 October 2020 22:43:52 BST, Thomas Gleixner <tglx@linutronix.de> wrote:
> It makes the callers slightly more readable, not having to cast to uint32_t* from the struct.
>
> I did ponder defining a new struct with bitfields named along the
> lines of 'msi_addr_bits_19_to_4', but that seemed like overkill.

I did something like this in the meantime, because all of this just
sucks.

  git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/apic

Hot of the press and completely untested.

>>> +	/*
>>> +	 * They're in a bit of a random order for historical reasons, but
>>> +	 * the IO/APIC is just a device for turning interrupt lines into
>>> +	 * MSIs, and various bits of the MSI addr/data are just swizzled
>>> +	 * into/from the bits of Redirection Table Entry.
>>> +	 */
>>> +	entry[0] &= 0xfffff000;
>>> +	entry[0] |= (msg.data & (MSI_DATA_DELIVERY_MODE_MASK |
>>> +				 MSI_DATA_VECTOR_MASK));
>>> +	entry[0] |= (msg.address_lo & MSI_ADDR_DEST_MODE_MASK) << 9;
>>> +
>>> +	entry[1] &= 0xffff;
>>> +	entry[1] |= (msg.address_lo & MSI_ADDR_DEST_ID_MASK) << 12;
>>
>>Sorry, but this is unreviewable gunk.
>
> Crap. Sure, want to look at the I/OAPIC and MSI documentation and
> observe that it's just shifting bits around and "these bits go there,
> those bits go here..." but there's no magic which will make that go
> away. At some point you end up swizzling bits around with seemingly
> random bitmasks and shifts which you have to have worked out from the
> docs.

Yes, we can't avoid the bit swizzling at all. But it can be made more
readable.

> Now that I killed off the various IOMMU bits which also horrifically
> touch the RTE directly, perhaps there is more scope for faffing around
> with it differently, but it won't really change much. It's just
> urinating into the atmospheric disturbance.

Well, you can look at it this way, but I surely have better things to do
than taking pencil and paper and drawing up mappings when reading
through that code or a patch against it.

>> Aside of that it works magically because polarity,trigger and mask bit
>> have been set up before. But of course a comment about this is
>> completely overrated.
>
> Well yes, there's a lot about the I/OAPIC code which is a bit horrid
> but this patch is doing just one thing: making the bits get from
> e.g. cfg->dest_apicid and cfg->vector into the RTE in a generic
> fashion which does the right thing for IR too. Other cleanups are
> somewhat orthogonal, but yes it's a good spot that one of these was
> somewhat redundant in the first place. We could fix that up in a
> separate patch which comes first perhaps.

Yes, that code is horrid, but adding a comment to that effect when
changing it is not asked too much, right?

> If we're starting to clean up I/OAPIC code I'd like to take a long
> hard look at the level ack code paths, and the fact that we have
> separate ack_apic_irq and apic_ack_irq functions (or something like
> that; on phone now) which do different things. Perhaps the order (or
> the capitals on APIC which one of them has) makes it sane and
> meaningful for them both to exist and do different things?

The naming conventions are definitely not sane.

> I also note the Hyper-V "remapping" takes the IR code path and I'm not
> sure that's OK. Because of the complete lack of overall documentation
> on what it's all doing and why.

I stared into that today as well. There is a fundamental difference
between real hardware remapping and this.

Especially the affinity setting mode changes with remapping and hyper-v
goes into the remap path, but that hyperv driver does not issue
"irq_set_status_flags(virq, IRQ_MOVE_PCNTXT)", which means that you
can't change the affinity of the I/O-APIC interrupts in that hyperv mode
at all.

If that flag is not set then affinity changes are done in actual
interrupt context in ioapic_ack_level() which is only used for the
non-IR chip ...

I'm still wrapping my head around getting rid of this thing completely
because now it's just a subset of your KVM case with the only
restriction that I/O-APIC cannot be affined to any CPU with a APIC id
greater than 255.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v2 8/8] x86/ioapic: Generate RTE directly from parent irqchip's MSI message
  2020-10-23 21:28         ` Thomas Gleixner
@ 2020-10-24  8:26           ` David Woodhouse
  2020-10-24  8:41             ` David Woodhouse
  2020-10-24  9:13             ` Paolo Bonzini
  0 siblings, 2 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-24  8:26 UTC (permalink / raw)
  To: Thomas Gleixner, x86
  Cc: kvm, Paolo Bonzini, linux-kernel, linux-hyperv, Dexuan Cui

[-- Attachment #1: Type: text/plain, Size: 3767 bytes --]

On Fri, 2020-10-23 at 23:28 +0200, Thomas Gleixner wrote:
> On Fri, Oct 23 2020 at 11:10, David Woodhouse wrote:
> > On 22 October 2020 22:43:52 BST, Thomas Gleixner <tglx@linutronix.de> wrote:
> > It makes the callers slightly more readable, not having to cast to uint32_t* from the struct.
> > 
> > I did ponder defining a new struct with bitfields named along the
> > lines of 'msi_addr_bits_19_to_4', but that seemed like overkill.
> 
> I did something like this in the meantime, because all of this just
> sucks.
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/apic
> 
> Hot of the press and completely untested.

Hm, your struct IO_APIC_route_entry isn't actually a union; you've
defined a 128-bit structure with the IR fields *following* the non-IR
fields. But there *is* a union in io_apic.c, of that 128-bit structure
and two uint32_ts. Suspect that wasn't quite what you intended. I'll
prod at it this morning and turn it into a single union of the three,
and give it some testing.

Also, my "Move MSI support into hpet.c" patch¹ got updated to
s/CONFIG_PCI_MSI/CONFIG_GENERIC_MSI_IRQ/ at about line 53 for the MSI-
related variable declarations, which was going to be in the next
version I posted.

I was also hoping Paolo was going to take the patch which just defines
the KVM_FEATURE_MSI_EXT_DEST_ID bit² ASAP, so that we end up with a
second patch³ that *just* wires it up to x86_init.msi_ext_dest_id() for
KVM.

¹ https://git.infradead.org/users/dwmw2/linux.git/commitdiff/734719c1f4
² https://git.infradead.org/users/dwmw2/linux.git/commitdiff/3f371d6749
³ https://git.infradead.org/users/dwmw2/linux.git/commitdiff/8399e14eb5

> Yes, we can't avoid the bit swizzling at all. But it can be made more
> readable.

Hm, I was about to concede that your version is a bit more readable.

But then I got to your new __ipi_msi_compose_msg() and realised that it
isn't working because it's setting the 0xFEE base address in the _low_
bits, somehow...

	msg->arch_addr_lo.base_address = X86_MSI_BASE_ADDRESS_LOW;
	printk("1 Compose MSI message %x/%x\n", msg->address_lo, msg->data);
	msg->arch_addr_lo.dest_mode_logical = apic->dest_mode_logical;
	printk("2 Compose MSI message %x/%x\n", msg->address_lo, msg->data);
	msg->arch_addr_lo.destid_0_7 = cfg->dest_apicid & 0xFF;
	printk("3 Compose MSI message %x/%x\n", msg->address_lo, msg->data);

[    1.793874] 1 Compose MSI message fee/0
[    1.794310] 2 Compose MSI message fee/0
[    1.794768] 3 Compose MSI message f02/0

And now I wish it was just a simple shift instead of an unholy maze of
overlapping unions of bitfields. But I'll make more coffee and stare at
it harder...

> Yes, that code is horrid, but adding a comment to that effect when
> changing it is not asked too much, right?

Sure. I just actually hadn't noticed that setting the dest/vector bits
right there was entirely redundant in the first place.

> I'm still wrapping my head around getting rid of this thing completely
> because now it's just a subset of your KVM case with the only
> restriction that I/O-APIC cannot be affined to any CPU with a APIC id
> greater than 255.

It was only ever that restriction anyway, wasn't it? Hyper-V PCI has
its own MSI handling, and there's no HPET so it was only ever the
I/OAPIC which was problematic there.

There are Hyper-V VM sizes with 416 vCPUs which depend on this today,
and which don't have the 15-bit MSI extension. Removing hyperv-iommu
would prevent us from using all the vCPUs on those. You *could* make
hyperv-iommu decline to initialise if x86_init.msi_ext_dest_id()
returns true though⁴.

⁴ https://git.infradead.org/users/dwmw2/linux.git/commitdiff/633ccf0d42

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v2 8/8] x86/ioapic: Generate RTE directly from parent irqchip's MSI message
  2020-10-24  8:26           ` David Woodhouse
@ 2020-10-24  8:41             ` David Woodhouse
  2020-10-24  9:13             ` Paolo Bonzini
  1 sibling, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-24  8:41 UTC (permalink / raw)
  To: Thomas Gleixner, x86
  Cc: kvm, Paolo Bonzini, linux-kernel, linux-hyperv, Dexuan Cui

[-- Attachment #1: Type: text/plain, Size: 1888 bytes --]

On Sat, 2020-10-24 at 09:26 +0100, David Woodhouse wrote:
> 
> And now I wish it was just a simple shift instead of an unholy maze of
> overlapping unions of bitfields. But I'll make more coffee and stare at
> it harder...

Hah, it really *was* unions of the bitfields. This boots...

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index c65e89bedc93..3e14adba34ba 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -65,6 +65,8 @@ union IO_APIC_reg_03 {
 };
 
 struct IO_APIC_route_entry {
+  union {
+    struct {
 	u64	vector			:  8,
 		delivery_mode		:  3,
 		dest_mode_logical	:  1,
@@ -77,7 +79,8 @@ struct IO_APIC_route_entry {
 		reserved_1		: 17,
 		virt_destid_8_14	:  7,
 		destid_0_7		:  8;
-
+    };
+    struct {
 	u64	ir_shared_0		:  8,
 		ir_zero			:  3,
 		ir_index_15		:  1,
@@ -85,6 +88,8 @@ struct IO_APIC_route_entry {
 		ir_reserved_0		: 31,
 		ir_format		:  1,
 		ir_index_0_14		: 15;
+    };
+  };
 } __attribute__ ((packed));
 
 struct irq_alloc_info;
diff --git a/arch/x86/include/asm/msi.h b/arch/x86/include/asm/msi.h
index 8cf82d134b78..ef9dfaa4603f 100644
--- a/arch/x86/include/asm/msi.h
+++ b/arch/x86/include/asm/msi.h
@@ -25,6 +25,7 @@ typedef struct x86_msi_data {
 
 typedef struct x86_msi_addr_lo {
 	union {
+		struct {
 		u32	reserved_0		:  2,
 			dest_mode_logical	:  1,
 			redirect_hint		:  1,
@@ -32,13 +33,15 @@ typedef struct x86_msi_addr_lo {
 			virt_destid_8_14	:  7,
 			destid_0_7		:  8,
 			base_address		: 12;
-
+		};
+		struct {
 		u32	dmar_reserved_0		:  2,
 			dmar_index_15		:  1,
 			dmar_subhandle_valid	:  1,
 			dmar_format		:  1,
 			dmar_index_0_14		: 15,
 			dmar_base_address	: 12;
+		};
 	};
 } __attribute__ ((packed)) arch_msi_msg_addr_lo_t;
 #define arch_msi_msg_addr_lo	x86_msi_addr_lo


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* Re: [PATCH v2 8/8] x86/ioapic: Generate RTE directly from parent irqchip's MSI message
  2020-10-24  8:26           ` David Woodhouse
  2020-10-24  8:41             ` David Woodhouse
@ 2020-10-24  9:13             ` Paolo Bonzini
  2020-10-24 10:13               ` David Woodhouse
  1 sibling, 1 reply; 192+ messages in thread
From: Paolo Bonzini @ 2020-10-24  9:13 UTC (permalink / raw)
  To: David Woodhouse, Thomas Gleixner, x86
  Cc: kvm, linux-kernel, linux-hyperv, Dexuan Cui

On 24/10/20 10:26, David Woodhouse wrote:
> I was also hoping Paolo was going to take the patch which just defines
> the KVM_FEATURE_MSI_EXT_DEST_ID bit² ASAP, so that we end up with a
> second patch³ that *just* wires it up to x86_init.msi_ext_dest_id() for
> KVM.
> 
> ¹ https://git.infradead.org/users/dwmw2/linux.git/commitdiff/734719c1f4
> ² https://git.infradead.org/users/dwmw2/linux.git/commitdiff/3f371d6749
> ³ https://git.infradead.org/users/dwmw2/linux.git/commitdiff/8399e14eb5

Yes, I am going to take it.

I was already sort of playing with fire with the 5.10 pull request (and
with me being lousy in general during the 5.10 development period, to be
honest), so I left it for rc2 or rc3.  It's just docs and it happened to
conflict with another documentation patch that had gone in through Jon
Corbet's tree.

Paolo


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v2 8/8] x86/ioapic: Generate RTE directly from parent irqchip's MSI message
  2020-10-24  9:13             ` Paolo Bonzini
@ 2020-10-24 10:13               ` David Woodhouse
  2020-10-24 12:44                 ` David Woodhouse
  0 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 10:13 UTC (permalink / raw)
  To: Paolo Bonzini, Thomas Gleixner, x86
  Cc: kvm, linux-kernel, linux-hyperv, Dexuan Cui



On 24 October 2020 10:13:36 BST, Paolo Bonzini <pbonzini@redhat.com> wrote:
>On 24/10/20 10:26, David Woodhouse wrote:
>> I was also hoping Paolo was going to take the patch which just
>defines
>> the KVM_FEATURE_MSI_EXT_DEST_ID bit² ASAP, so that we end up with a
>> second patch³ that *just* wires it up to x86_init.msi_ext_dest_id()
>for
>> KVM.
>> 
>> ¹
>https://git.infradead.org/users/dwmw2/linux.git/commitdiff/734719c1f4
>> ²
>https://git.infradead.org/users/dwmw2/linux.git/commitdiff/3f371d6749
>> ³
>https://git.infradead.org/users/dwmw2/linux.git/commitdiff/8399e14eb5
>
>Yes, I am going to take it.
>
>I was already sort of playing with fire with the 5.10 pull request (and
>with me being lousy in general during the 5.10 development period, to
>be
>honest), so I left it for rc2 or rc3.  It's just docs and it happened
>to
>conflict with another documentation patch that had gone in through Jon
>Corbet's tree.

OK, thanks. I'll rework Thomas's tree with that first and the other changes I'd mentioned in my parts, as well as fixing up that unholy chimæra of struct/union in which we set some bitfields from each side of the union, test and push it out later today.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v2 8/8] x86/ioapic: Generate RTE directly from parent irqchip's MSI message
  2020-10-24 10:13               ` David Woodhouse
@ 2020-10-24 12:44                 ` David Woodhouse
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
  0 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 12:44 UTC (permalink / raw)
  To: Paolo Bonzini, Thomas Gleixner, x86
  Cc: kvm, linux-kernel, linux-hyperv, Dexuan Cui

[-- Attachment #1: Type: text/plain, Size: 18923 bytes --]

On Sat, 2020-10-24 at 11:13 +0100, David Woodhouse wrote:
> OK, thanks. I'll rework Thomas's tree with that first and the other
> changes I'd mentioned in my parts, as well as fixing up that unholy
> chimæra of struct/union in which we set some bitfields from each side
> of the union, test and push it out later today.

OK, pushed out to 
https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/x86/apic]

It's Thomas's tree plus the struct/union fixes and other things I
mentioned earlier, a few comment fixes in the 'Generate RTE directly
from parent irqchip' patch, but the most interesting part is finishing
the job of the 'Cleanup IO/APIC route entry structs' patch...

From 54b623fc2b03eadb76485b4ca0ade3e79acf6c27 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx@linutronix.de>
Date: Thu, 22 Oct 2020 14:48:18 +0200
Subject: [PATCH 124/137] x86/ioapic: Cleanup IO/APIC route entry structs

Having two seperate structs for the I/O-APIC RTE entries (non-remapped and
DMAR remapped) requires type casts and makes it hard to map.

Combine them in IO_APIC_routing_entry by defining a union of two 64bit
bitfields. Use naming which reflects which bits are shared and which bits
are actually different for the operating modes.

[dwmw2: Fix it up and finish the job, pulling the 32-bit w1,w2 words for
        register access into the same union and eliminating a few more
        places where bits were accessed through masks and shifts.]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/io_apic.h      |  78 ++++++---------
 arch/x86/kernel/apic/io_apic.c      | 144 +++++++++++++---------------
 drivers/iommu/amd/iommu.c           |   8 +-
 drivers/iommu/hyperv-iommu.c        |   4 +-
 drivers/iommu/intel/irq_remapping.c |  19 ++--
 5 files changed, 108 insertions(+), 145 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index a1a26f6d3aa4..73da644b2f0d 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -13,15 +13,6 @@
  * Copyright (C) 1997, 1998, 1999, 2000 Ingo Molnar
  */
 
-/* I/O Unit Redirection Table */
-#define IO_APIC_REDIR_VECTOR_MASK	0x000FF
-#define IO_APIC_REDIR_DEST_LOGICAL	0x00800
-#define IO_APIC_REDIR_DEST_PHYSICAL	0x00000
-#define IO_APIC_REDIR_SEND_PENDING	(1 << 12)
-#define IO_APIC_REDIR_REMOTE_IRR	(1 << 14)
-#define IO_APIC_REDIR_LEVEL_TRIGGER	(1 << 15)
-#define IO_APIC_REDIR_MASKED		(1 << 16)
-
 /*
  * The structure of the IO-APIC:
  */
@@ -65,52 +56,39 @@ union IO_APIC_reg_03 {
 };
 
 struct IO_APIC_route_entry {
-	__u32	vector		:  8,
-		delivery_mode	:  3,	/* 000: FIXED
-					 * 001: lowest prio
-					 * 111: ExtINT
-					 */
-		dest_mode	:  1,	/* 0: physical, 1: logical */
-		delivery_status	:  1,
-		polarity	:  1,
-		irr		:  1,
-		trigger		:  1,	/* 0: edge, 1: level */
-		mask		:  1,	/* 0: enabled, 1: disabled */
-		__reserved_2	: 15;
-
-	__u32	__reserved_3	: 24,
-		dest		:  8;
-} __attribute__ ((packed));
-
-struct IR_IO_APIC_route_entry {
-	__u64	vector		: 8,
-		zero		: 3,
-		index2		: 1,
-		delivery_status : 1,
-		polarity	: 1,
-		irr		: 1,
-		trigger		: 1,
-		mask		: 1,
-		reserved	: 31,
-		format		: 1,
-		index		: 15;
+	union {
+		struct {
+			u64	vector			:  8,
+				delivery_mode		:  3,
+				dest_mode_logical	:  1,
+				delivery_status		:  1,
+				active_low		:  1,
+				irr			:  1,
+				is_level		:  1,
+				masked			:  1,
+				reserved_0		: 15,
+				reserved_1		: 24,
+				destid_0_7		:  8;
+		};
+		struct {
+			u64	ir_shared_0		:  8,
+				ir_zero			:  3,
+				ir_index_15		:  1,
+				ir_shared_1		:  5,
+				ir_reserved_0		: 31,
+				ir_format		:  1,
+				ir_index_0_14		: 15;
+		};
+		struct {
+			u64	w1			: 32,
+				w2			: 32;
+		};
+	};
 } __attribute__ ((packed));
 
 struct irq_alloc_info;
 struct ioapic_domain_cfg;
 
-#define IOAPIC_EDGE			0
-#define IOAPIC_LEVEL			1
-
-#define IOAPIC_MASKED			1
-#define IOAPIC_UNMASKED			0
-
-#define IOAPIC_POL_HIGH			0
-#define IOAPIC_POL_LOW			1
-
-#define IOAPIC_DEST_MODE_PHYSICAL	0
-#define IOAPIC_DEST_MODE_LOGICAL	1
-
 #define	IOAPIC_MAP_ALLOC		0x1
 #define	IOAPIC_MAP_CHECK		0x2
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 24a7bba7cbf4..07e754131854 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -286,31 +286,26 @@ static void io_apic_write(unsigned int apic, unsigned int reg,
 	writel(value, &io_apic->data);
 }
 
-union entry_union {
-	struct { u32 w1, w2; };
-	struct IO_APIC_route_entry entry;
-};
-
 static struct IO_APIC_route_entry __ioapic_read_entry(int apic, int pin)
 {
-	union entry_union eu;
+	struct IO_APIC_route_entry entry;
 
-	eu.w1 = io_apic_read(apic, 0x10 + 2 * pin);
-	eu.w2 = io_apic_read(apic, 0x11 + 2 * pin);
+	entry.w1 = io_apic_read(apic, 0x10 + 2 * pin);
+	entry.w2 = io_apic_read(apic, 0x11 + 2 * pin);
 
-	return eu.entry;
+	return entry;
 }
 
 static struct IO_APIC_route_entry ioapic_read_entry(int apic, int pin)
 {
-	union entry_union eu;
+	struct IO_APIC_route_entry entry;
 	unsigned long flags;
 
 	raw_spin_lock_irqsave(&ioapic_lock, flags);
-	eu.entry = __ioapic_read_entry(apic, pin);
+	entry = __ioapic_read_entry(apic, pin);
 	raw_spin_unlock_irqrestore(&ioapic_lock, flags);
 
-	return eu.entry;
+	return entry;
 }
 
 /*
@@ -321,11 +316,8 @@ static struct IO_APIC_route_entry ioapic_read_entry(int apic, int pin)
  */
 static void __ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e)
 {
-	union entry_union eu = {{0, 0}};
-
-	eu.entry = e;
-	io_apic_write(apic, 0x11 + 2*pin, eu.w2);
-	io_apic_write(apic, 0x10 + 2*pin, eu.w1);
+	io_apic_write(apic, 0x11 + 2*pin, e.w2);
+	io_apic_write(apic, 0x10 + 2*pin, e.w1);
 }
 
 static void ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e)
@@ -344,12 +336,12 @@ static void ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e)
  */
 static void ioapic_mask_entry(int apic, int pin)
 {
+	struct IO_APIC_route_entry e = { .masked = true };
 	unsigned long flags;
-	union entry_union eu = { .entry.mask = IOAPIC_MASKED };
 
 	raw_spin_lock_irqsave(&ioapic_lock, flags);
-	io_apic_write(apic, 0x10 + 2*pin, eu.w1);
-	io_apic_write(apic, 0x11 + 2*pin, eu.w2);
+	io_apic_write(apic, 0x10 + 2*pin, e.w1);
+	io_apic_write(apic, 0x11 + 2*pin, e.w2);
 	raw_spin_unlock_irqrestore(&ioapic_lock, flags);
 }
 
@@ -422,20 +414,15 @@ static void __init replace_pin_at_irq_node(struct mp_chip_data *data, int node,
 	add_pin_to_irq_node(data, node, newapic, newpin);
 }
 
-static void io_apic_modify_irq(struct mp_chip_data *data,
-			       int mask_and, int mask_or,
+static void io_apic_modify_irq(struct mp_chip_data *data, bool masked,
 			       void (*final)(struct irq_pin_list *entry))
 {
-	union entry_union eu;
 	struct irq_pin_list *entry;
 
-	eu.entry = data->entry;
-	eu.w1 &= mask_and;
-	eu.w1 |= mask_or;
-	data->entry = eu.entry;
+	data->entry.masked = masked;
 
 	for_each_irq_pin(entry, data->irq_2_pin) {
-		io_apic_write(entry->apic, 0x10 + 2 * entry->pin, eu.w1);
+		io_apic_write(entry->apic, 0x10 + 2 * entry->pin, data->entry.w1);
 		if (final)
 			final(entry);
 	}
@@ -459,13 +446,13 @@ static void mask_ioapic_irq(struct irq_data *irq_data)
 	unsigned long flags;
 
 	raw_spin_lock_irqsave(&ioapic_lock, flags);
-	io_apic_modify_irq(data, ~0, IO_APIC_REDIR_MASKED, &io_apic_sync);
+	io_apic_modify_irq(data, true, &io_apic_sync);
 	raw_spin_unlock_irqrestore(&ioapic_lock, flags);
 }
 
 static void __unmask_ioapic(struct mp_chip_data *data)
 {
-	io_apic_modify_irq(data, ~IO_APIC_REDIR_MASKED, 0, NULL);
+	io_apic_modify_irq(data, false, NULL);
 }
 
 static void unmask_ioapic_irq(struct irq_data *irq_data)
@@ -506,8 +493,8 @@ static void __eoi_ioapic_pin(int apic, int pin, int vector)
 		/*
 		 * Mask the entry and change the trigger mode to edge.
 		 */
-		entry1.mask = IOAPIC_MASKED;
-		entry1.trigger = IOAPIC_EDGE;
+		entry1.masked = true;
+		entry1.is_level = false;
 
 		__ioapic_write_entry(apic, pin, entry1);
 
@@ -542,8 +529,8 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin)
 	 * Make sure the entry is masked and re-read the contents to check
 	 * if it is a level triggered pin and if the remote-IRR is set.
 	 */
-	if (entry.mask == IOAPIC_UNMASKED) {
-		entry.mask = IOAPIC_MASKED;
+	if (!entry.masked) {
+		entry.masked = true;
 		ioapic_write_entry(apic, pin, entry);
 		entry = ioapic_read_entry(apic, pin);
 	}
@@ -556,8 +543,8 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin)
 		 * doesn't clear the remote-IRR if the trigger mode is not
 		 * set to level.
 		 */
-		if (entry.trigger == IOAPIC_EDGE) {
-			entry.trigger = IOAPIC_LEVEL;
+		if (!entry.is_level) {
+			entry.is_level = true;
 			ioapic_write_entry(apic, pin, entry);
 		}
 		raw_spin_lock_irqsave(&ioapic_lock, flags);
@@ -659,8 +646,8 @@ void mask_ioapic_entries(void)
 			struct IO_APIC_route_entry entry;
 
 			entry = ioapics[apic].saved_registers[pin];
-			if (entry.mask == IOAPIC_UNMASKED) {
-				entry.mask = IOAPIC_MASKED;
+			if (!entry.masked) {
+				entry.masked = true;
 				ioapic_write_entry(apic, pin, entry);
 			}
 		}
@@ -947,8 +934,8 @@ static bool mp_check_pin_attr(int irq, struct irq_alloc_info *info)
 	if (irq < nr_legacy_irqs() && data->count == 1) {
 		if (info->ioapic.is_level != data->is_level)
 			mp_register_handler(irq, info->ioapic.is_level);
-		data->entry.trigger = data->is_level = info->ioapic.is_level;
-		data->entry.polarity = data->active_low = info->ioapic.active_low;
+		data->entry.is_level = data->is_level = info->ioapic.is_level;
+		data->entry.active_low = data->active_low = info->ioapic.active_low;
 	}
 
 	return data->is_level == info->ioapic.is_level &&
@@ -1231,10 +1218,9 @@ void ioapic_zap_locks(void)
 
 static void io_apic_print_entries(unsigned int apic, unsigned int nr_entries)
 {
-	int i;
-	char buf[256];
 	struct IO_APIC_route_entry entry;
-	struct IR_IO_APIC_route_entry *ir_entry = (void *)&entry;
+	char buf[256];
+	int i;
 
 	printk(KERN_DEBUG "IOAPIC %d:\n", apic);
 	for (i = 0; i <= nr_entries; i++) {
@@ -1242,20 +1228,20 @@ static void io_apic_print_entries(unsigned int apic, unsigned int nr_entries)
 		snprintf(buf, sizeof(buf),
 			 " pin%02x, %s, %s, %s, V(%02X), IRR(%1d), S(%1d)",
 			 i,
-			 entry.mask == IOAPIC_MASKED ? "disabled" : "enabled ",
-			 entry.trigger == IOAPIC_LEVEL ? "level" : "edge ",
-			 entry.polarity == IOAPIC_POL_LOW ? "low " : "high",
+			 entry.masked ? "disabled" : "enabled ",
+			 entry.is_level ? "level" : "edge ",
+			 entry.active_low ? "low " : "high",
 			 entry.vector, entry.irr, entry.delivery_status);
-		if (ir_entry->format)
+		if (entry.ir_format) {
 			printk(KERN_DEBUG "%s, remapped, I(%04X),  Z(%X)\n",
-			       buf, (ir_entry->index2 << 15) | ir_entry->index,
-			       ir_entry->zero);
-		else
-			printk(KERN_DEBUG "%s, %s, D(%02X), M(%1d)\n",
 			       buf,
-			       entry.dest_mode == IOAPIC_DEST_MODE_LOGICAL ?
-			       "logical " : "physical",
-			       entry.dest, entry.delivery_mode);
+			       (entry.ir_index_15 << 15) | entry.ir_index_0_14,
+				entry.ir_zero);
+		} else {
+			printk(KERN_DEBUG "%s, %s, D(%02X), M(%1d)\n", buf,
+			       entry.dest_mode_logical ? "logical " : "physical",
+			       entry.destid_0_7, entry.delivery_mode);
+		}
 	}
 }
 
@@ -1380,8 +1366,8 @@ void __init enable_IO_APIC(void)
 		/* If the interrupt line is enabled and in ExtInt mode
 		 * I have found the pin where the i8259 is connected.
 		 */
-		if ((entry.mask == 0) &&
-		    (entry.delivery_mode == APIC_DELIVERY_MODE_EXTINT)) {
+		if (!entry.masked &&
+		    entry.delivery_mode == APIC_DELIVERY_MODE_EXTINT) {
 			ioapic_i8259.apic = apic;
 			ioapic_i8259.pin  = pin;
 			goto found_i8259;
@@ -1425,12 +1411,12 @@ void native_restore_boot_irq_mode(void)
 		struct IO_APIC_route_entry entry;
 
 		memset(&entry, 0, sizeof(entry));
-		entry.mask		= IOAPIC_UNMASKED;
-		entry.trigger		= IOAPIC_EDGE;
-		entry.polarity		= IOAPIC_POL_HIGH;
-		entry.dest_mode		= IOAPIC_DEST_MODE_PHYSICAL;
+		entry.masked		= false;
+		entry.is_level		= false;
+		entry.active_low	= false;
+		entry.dest_mode_logical	= false;
 		entry.delivery_mode	= APIC_DELIVERY_MODE_EXTINT;
-		entry.dest		= read_apic_id();
+		entry.destid_0_7	= read_apic_id();
 
 		/*
 		 * Add it to the IO-APIC irq-routing table:
@@ -1709,13 +1695,13 @@ static bool io_apic_level_ack_pending(struct mp_chip_data *data)
 
 	raw_spin_lock_irqsave(&ioapic_lock, flags);
 	for_each_irq_pin(entry, data->irq_2_pin) {
-		unsigned int reg;
+		struct IO_APIC_route_entry e;
 		int pin;
 
 		pin = entry->pin;
-		reg = io_apic_read(entry->apic, 0x10 + pin*2);
+		e.w1 = io_apic_read(entry->apic, 0x10 + pin*2);
 		/* Is the remote IRR bit set? */
-		if (reg & IO_APIC_REDIR_REMOTE_IRR) {
+		if (e.irr) {
 			raw_spin_unlock_irqrestore(&ioapic_lock, flags);
 			return true;
 		}
@@ -1874,7 +1860,7 @@ static void ioapic_configure_entry(struct irq_data *irqd)
 	 * ioapic chip to verify that.
 	 */
 	if (irqd->chip == &ioapic_chip) {
-		mpd->entry.dest = cfg->dest_apicid;
+		mpd->entry.destid_0_7 = cfg->dest_apicid;
 		mpd->entry.vector = cfg->vector;
 	}
 	for_each_irq_pin(entry, mpd->irq_2_pin)
@@ -1932,7 +1918,7 @@ static int ioapic_irq_get_chip_state(struct irq_data *irqd,
 		 * irrelevant because the IO-APIC treats them as fire and
 		 * forget.
 		 */
-		if (rentry.irr && rentry.trigger) {
+		if (rentry.irr && rentry.is_level) {
 			*state = true;
 			break;
 		}
@@ -2057,12 +2043,12 @@ static inline void __init unlock_ExtINT_logic(void)
 
 	memset(&entry1, 0, sizeof(entry1));
 
-	entry1.dest_mode = IOAPIC_DEST_MODE_PHYSICAL;
-	entry1.mask = IOAPIC_UNMASKED;
-	entry1.dest = hard_smp_processor_id();
-	entry1.delivery_mode = APIC_DELIVERY_MODE_EXTINT;
-	entry1.polarity = entry0.polarity;
-	entry1.trigger = IOAPIC_EDGE;
+	entry1.dest_mode_logical	= true;
+	entry1.masked			= false;
+	entry1.destid_0_7		= hard_smp_processor_id();
+	entry1.delivery_mode		= APIC_DELIVERY_MODE_EXTINT;
+	entry1.active_low		= entry0.active_low;
+	entry1.is_level			= false;
 	entry1.vector = 0;
 
 	ioapic_write_entry(apic, pin, entry1);
@@ -2937,17 +2923,17 @@ static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
 			   struct IO_APIC_route_entry *entry)
 {
 	memset(entry, 0, sizeof(*entry));
-	entry->delivery_mode = apic->delivery_mode;
-	entry->dest_mode     = apic->dest_mode_logical;
-	entry->dest	     = cfg->dest_apicid;
-	entry->vector	     = cfg->vector;
-	entry->trigger	     = data->is_level;
-	entry->polarity	     = data->active_low;
+	entry->delivery_mode	 = apic->delivery_mode;
+	entry->dest_mode_logical = apic->dest_mode_logical;
+	entry->destid_0_7	 = cfg->dest_apicid;
+	entry->vector		 = cfg->vector;
+	entry->is_level		 = data->is_level;
+	entry->active_low	 = data->active_low;
 	/*
 	 * Mask level triggered irqs. Edge triggered irqs are masked
 	 * by the irq core code in case they fire.
 	 */
-	entry->mask = data->is_level;
+	entry->masked		= data->is_level;
 }
 
 int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index b0e5210e53b2..3d72ec7bbbf8 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3687,11 +3687,11 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 		entry = info->ioapic.entry;
 		info->ioapic.entry = NULL;
 		memset(entry, 0, sizeof(*entry));
-		entry->vector	= index;
-		entry->trigger	= info->ioapic.is_level;
-		entry->polarity	= info->ioapic.active_low;
+		entry->vector		= index;
+		entry->is_level		= info->ioapic.is_level;
+		entry->active_low	= info->ioapic.active_low;
 		/* Mask level triggered irqs. */
-		entry->mask	= info->ioapic.is_level;
+		entry->masked		= info->ioapic.is_level;
 		break;
 
 	case X86_IRQ_ALLOC_TYPE_HPET:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e09e2d734c57..1ab7eb918a5c 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -52,7 +52,7 @@ static int hyperv_ir_set_affinity(struct irq_data *data,
 		return ret;
 
 	entry = data->chip_data;
-	entry->dest = cfg->dest_apicid;
+	entry->destid_0_7 = cfg->dest_apicid;
 	entry->vector = cfg->vector;
 	send_cleanup_vector(cfg);
 
@@ -125,7 +125,7 @@ static int hyperv_irq_remapping_activate(struct irq_domain *domain,
 	struct irq_cfg *cfg = irqd_cfg(irq_data);
 	struct IO_APIC_route_entry *entry = irq_data->chip_data;
 
-	entry->dest = cfg->dest_apicid;
+	entry->destid_0_7 = cfg->dest_apicid;
 	entry->vector = cfg->vector;
 
 	return 0;
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 54ca69333445..625bdb9f1627 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1279,8 +1279,8 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 					     struct irq_alloc_info *info,
 					     int index, int sub_handle)
 {
-	struct IR_IO_APIC_route_entry *entry;
 	struct irte *irte = &data->irte_entry;
+	struct IO_APIC_route_entry *entry;
 
 	prepare_irte(irte, irq_cfg->vector, irq_cfg->dest_apicid);
 	switch (info->type) {
@@ -1294,22 +1294,21 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 			irte->avail, irte->vector, irte->dest_id,
 			irte->sid, irte->sq, irte->svt);
 
-		entry = (struct IR_IO_APIC_route_entry *)info->ioapic.entry;
+		entry = info->ioapic.entry;
 		info->ioapic.entry = NULL;
 		memset(entry, 0, sizeof(*entry));
-		entry->index2	= (index >> 15) & 0x1;
-		entry->zero	= 0;
-		entry->format	= 1;
-		entry->index	= (index & 0x7fff);
+		entry->ir_index_15	= !!(index & 0x8000);
+		entry->ir_format	= true;
+		entry->ir_index_0_14	= index & 0x7fff;
 		/*
 		 * IO-APIC RTE will be configured with virtual vector.
 		 * irq handler will do the explicit EOI to the io-apic.
 		 */
-		entry->vector	= info->ioapic.pin;
-		entry->trigger	= info->ioapic.is_level;
-		entry->polarity	= info->ioapic.active_low;
+		entry->vector		= info->ioapic.pin;
+		entry->is_level		= info->ioapic.is_level;
+		entry->active_low	= info->ioapic.active_low;
 		/* Mask level triggered irqs. */
-		entry->mask	= info->ioapic.is_level;
+		entry->masked		= info->ioapic.is_level;
 		break;
 
 	case X86_IRQ_ALLOC_TYPE_HPET:
-- 
2.17.1


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields
  2020-10-24 12:44                 ` David Woodhouse
@ 2020-10-24 21:35                   ` David Woodhouse
  2020-10-24 21:35                     ` [PATCH v3 01/35] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
                                       ` (35 more replies)
  0 siblings, 36 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui


Fix the conditions for enabling x2apic on guests without interrupt 
remapping, and support 15-bit Extended Destination ID to allow 32768 
CPUs without IR on hypervisors that support it.

Make the I/OAPIC code generate its RTE directly from the MSI message
created by the parent irqchip, and fix up a bunch of magic mask/shift
macros to use bitfields for MSI messages and I/OAPIC RTEs while we're
at it.

v3:
 • Lots of bitfield cleanups from Thomas.
 • Disable hyperv-iommu if 15-bit extension is present.
 • Fix inconsistent CONFIG_PCI_MSI/CONFIG_GENERIC_MSI_IRQ in hpet.c
 • Split KVM_FEATURE_MSI_EXT_DEST_ID patch, half of which is going upstream
   through KVM tree (and the other half needs to wait, or have an #ifdef) so
   is left at the top of the tree.

v2:
 • Minor cleanups.
 • Move __irq_msi_compose_msg() to apic.c, make virt_ext_dest_id static.
 • Generate I/OAPIC RTE directly from parent irqchip's MSI messages.
 • Clean up HPET MSI support into hpet.c now that we can.

David Woodhouse (19):
      x86/apic: Fix x2apic enablement without interrupt remapping
      x86/msi: Only use high bits of MSI address for DMAR unit
      x86/apic: Always provide irq_compose_msi_msg() method for vector domain
      x86/hpet: Move MSI support into hpet.c
      x86/ioapic: Generate RTE directly from parent irqchip's MSI message
      genirq/irqdomain: Implement get_name() method on irqchip fwnodes
      x86/apic: Add select() method on vector irqdomain
      iommu/amd: Implement select() method on remapping irqdomain
      iommu/vt-d: Implement select() method on remapping irqdomain
      iommu/hyper-v: Implement select() method on remapping irqdomain
      x86/hpet: Use irq_find_matching_fwspec() to find remapping irqdomain
      x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain
      x86: Kill all traces of irq_remapping_get_irq_domain()
      iommu/vt-d: Simplify intel_irq_remapping_select()
      x86/ioapic: Handle Extended Destination ID field in RTE
      x86/apic: Support 15 bits of APIC ID in MSI where available
      iommu/hyper-v: Disable IRQ pseudo-remapping if 15 bit APIC IDs are available
      x86/kvm: Reserve KVM_FEATURE_MSI_EXT_DEST_ID
      x86/kvm: Enable 15-bit extension when KVM_FEATURE_MSI_EXT_DEST_ID detected

Thomas Gleixner (16):
      x86/apic/uv: Fix inconsistent destination mode
      x86/devicetree: Fix the ioapic interrupt type table
      x86/apic: Cleanup delivery mode defines
      x86/apic: Replace pointless apic::dest_logical usage
      x86/apic: Get rid of apic::dest_logical
      x86/apic: Cleanup destination mode
      genirq/msi: Allow shadow declarations of msi_msg::$member
      x86/msi: Provide msi message shadow structs
      iommu/intel: Use msi_msg shadow structs
      iommu/amd: Use msi_msg shadow structs
      PCI: vmd: Use msi_msg shadow structs
      x86/kvm: Use msi_msg shadow structs
      x86/pci/xen: Use msi_msg shadow structs
      x86/msi: Remove msidef.h
      x86/io_apic: Cleanup trigger/polarity helpers
      x86/ioapic: Cleanup IO/APIC route entry structs

 Documentation/virt/kvm/cpuid.rst      |   4 +
 arch/x86/include/asm/apic.h           |  16 +-
 arch/x86/include/asm/apicdef.h        |  16 +-
 arch/x86/include/asm/hpet.h           |  11 -
 arch/x86/include/asm/hw_irq.h         |  13 +-
 arch/x86/include/asm/io_apic.h        |  79 ++----
 arch/x86/include/asm/irq_remapping.h  |   9 -
 arch/x86/include/asm/irqdomain.h      |   3 +
 arch/x86/include/asm/msi.h            |  50 ++++
 arch/x86/include/asm/msidef.h         |  57 ----
 arch/x86/include/asm/x86_init.h       |   2 +
 arch/x86/include/uapi/asm/kvm_para.h  |   1 +
 arch/x86/kernel/apic/apic.c           |  73 ++++-
 arch/x86/kernel/apic/apic_flat_64.c   |  18 +-
 arch/x86/kernel/apic/apic_noop.c      |  10 +-
 arch/x86/kernel/apic/apic_numachip.c  |  16 +-
 arch/x86/kernel/apic/bigsmp_32.c      |   9 +-
 arch/x86/kernel/apic/io_apic.c        | 503 ++++++++++++++++++----------------
 arch/x86/kernel/apic/ipi.c            |   6 +-
 arch/x86/kernel/apic/msi.c            | 153 +----------
 arch/x86/kernel/apic/probe_32.c       |   9 +-
 arch/x86/kernel/apic/vector.c         |  49 ++++
 arch/x86/kernel/apic/x2apic_cluster.c |  10 +-
 arch/x86/kernel/apic/x2apic_phys.c    |  17 +-
 arch/x86/kernel/apic/x2apic_uv_x.c    |  12 +-
 arch/x86/kernel/devicetree.c          |  30 +-
 arch/x86/kernel/hpet.c                | 122 ++++++++-
 arch/x86/kernel/kvm.c                 |   6 +
 arch/x86/kernel/smpboot.c             |   8 +-
 arch/x86/kernel/x86_init.c            |   1 +
 arch/x86/kvm/irq_comm.c               |  31 +--
 arch/x86/pci/intel_mid_pci.c          |   8 +-
 arch/x86/pci/xen.c                    |  26 +-
 arch/x86/platform/uv/uv_irq.c         |   4 +-
 arch/x86/xen/apic.c                   |   7 +-
 drivers/iommu/amd/amd_iommu_types.h   |   2 +-
 drivers/iommu/amd/init.c              |  46 ++--
 drivers/iommu/amd/iommu.c             |  93 +++----
 drivers/iommu/hyperv-iommu.c          |  44 +--
 drivers/iommu/intel/irq_remapping.c   | 102 +++----
 drivers/iommu/irq_remapping.c         |  14 -
 drivers/iommu/irq_remapping.h         |   3 -
 drivers/pci/controller/pci-hyperv.c   |   6 +-
 drivers/pci/controller/vmd.c          |   9 +-
 include/asm-generic/msi.h             |   4 +
 include/linux/msi.h                   |  46 +++-
 kernel/irq/irqdomain.c                |  11 +-
 47 files changed, 890 insertions(+), 879 deletions(-)




^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH v3 01/35] x86/apic: Fix x2apic enablement without interrupt remapping
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
  2020-10-24 21:35                     ` [PATCH v3 02/35] x86/msi: Only use high bits of MSI address for DMAR unit David Woodhouse
                                       ` (34 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: David Woodhouse <dwmw@amazon.co.uk>

Currently, Linux as a hypervisor guest will enable x2apic only if there are
no CPUs present at boot time with an APIC ID above 255.

Hotplugging a CPU later with a higher APIC ID would result in a CPU which
cannot be targeted by external interrupts.

Add a filter in x2apic_apic_id_valid() which can be used to prevent such
CPUs from coming online, and allow x2apic to be enabled even if they are
present at boot time.

Fixes: ce69a784504 ("x86/apic: Enable x2APIC without interrupt remapping under KVM")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/apic.h        |  1 +
 arch/x86/kernel/apic/apic.c        | 14 ++++++++------
 arch/x86/kernel/apic/x2apic_phys.c |  9 +++++++++
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 1c129abb7f09..b0fd204e0023 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -259,6 +259,7 @@ static inline u64 native_x2apic_icr_read(void)
 
 extern int x2apic_mode;
 extern int x2apic_phys;
+extern void __init x2apic_set_max_apicid(u32 apicid);
 extern void __init check_x2apic(void);
 extern void x2apic_setup(void);
 static inline int x2apic_enabled(void)
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index b3eef1d5c903..113f6ca7b828 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1841,20 +1841,22 @@ static __init void try_to_enable_x2apic(int remap_mode)
 		return;
 
 	if (remap_mode != IRQ_REMAP_X2APIC_MODE) {
-		/* IR is required if there is APIC ID > 255 even when running
-		 * under KVM
+		/*
+		 * Using X2APIC without IR is not architecturally supported
+		 * on bare metal but may be supported in guests.
 		 */
-		if (max_physical_apicid > 255 ||
-		    !x86_init.hyper.x2apic_available()) {
+		if (!x86_init.hyper.x2apic_available()) {
 			pr_info("x2apic: IRQ remapping doesn't support X2APIC mode\n");
 			x2apic_disable();
 			return;
 		}
 
 		/*
-		 * without IR all CPUs can be addressed by IOAPIC/MSI
-		 * only in physical mode
+		 * Without IR, all CPUs can be addressed by IOAPIC/MSI only
+		 * in physical mode, and CPUs with an APIC ID that cannnot
+		 * be addressed must not be brought online.
 		 */
+		x2apic_set_max_apicid(255);
 		x2apic_phys = 1;
 	}
 	x2apic_enable();
diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c
index bc9693841353..e14eae6d6ea7 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -8,6 +8,12 @@
 int x2apic_phys;
 
 static struct apic apic_x2apic_phys;
+static u32 x2apic_max_apicid __ro_after_init;
+
+void __init x2apic_set_max_apicid(u32 apicid)
+{
+	x2apic_max_apicid = apicid;
+}
 
 static int __init set_x2apic_phys_mode(char *arg)
 {
@@ -98,6 +104,9 @@ static int x2apic_phys_probe(void)
 /* Common x2apic functions, also used by x2apic_cluster */
 int x2apic_apic_id_valid(u32 apicid)
 {
+	if (x2apic_max_apicid && apicid > x2apic_max_apicid)
+		return 0;
+
 	return 1;
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 02/35] x86/msi: Only use high bits of MSI address for DMAR unit
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
  2020-10-24 21:35                     ` [PATCH v3 01/35] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
  2020-10-24 21:35                     ` [PATCH v3 03/35] x86/apic/uv: Fix inconsistent destination mode David Woodhouse
                                       ` (33 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: David Woodhouse <dwmw@amazon.co.uk>

The Intel IOMMU has an MSI-like configuration for its interrupt, but it
isn't really MSI. So it gets to abuse the high 32 bits of the address, and
puts the high 24 bits of the extended APIC ID there.

This isn't something that can be used in the general case for real MSIs,
since external devices using the high bits of the address would be
performing writes to actual memory space above 4GiB, not targeted at the
APIC.

Factor the hack out and allow it only to be used when appropriate, adding a
WARN_ON_ONCE() if other MSIs are targeted at an unreachable APIC ID. That
should never happen since the compatibility MSI messages are not used when
Interrupt Remapping is enabled.

The x2apic_enabled() check isn't needed because Linux won't bring up CPUs
with higher APIC IDs unless IR and x2apic are enabled anyway.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/apic/msi.c | 33 +++++++++++++++++++++++++++------
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 6313f0a05db7..516df47bde73 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -23,13 +23,11 @@
 
 struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
 
-static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg)
+static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
+				  bool dmar)
 {
 	msg->address_hi = MSI_ADDR_BASE_HI;
 
-	if (x2apic_enabled())
-		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
-
 	msg->address_lo =
 		MSI_ADDR_BASE_LO |
 		((apic->irq_dest_mode == 0) ?
@@ -43,18 +41,29 @@ static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg)
 		MSI_DATA_LEVEL_ASSERT |
 		MSI_DATA_DELIVERY_FIXED |
 		MSI_DATA_VECTOR(cfg->vector);
+
+	/*
+	 * Only the IOMMU itself can use the trick of putting destination
+	 * APIC ID into the high bits of the address. Anything else would
+	 * just be writing to memory if it tried that, and needs IR to
+	 * address higher APIC IDs.
+	 */
+	if (dmar)
+		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
+	else
+		WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid));
 }
 
 void x86_vector_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
 {
-	__irq_msi_compose_msg(irqd_cfg(data), msg);
+	__irq_msi_compose_msg(irqd_cfg(data), msg, false);
 }
 
 static void irq_msi_update_msg(struct irq_data *irqd, struct irq_cfg *cfg)
 {
 	struct msi_msg msg[2] = { [1] = { }, };
 
-	__irq_msi_compose_msg(cfg, msg);
+	__irq_msi_compose_msg(cfg, msg, false);
 	irq_data_get_irq_chip(irqd)->irq_write_msi_msg(irqd, msg);
 }
 
@@ -276,6 +285,17 @@ struct irq_domain *arch_create_remap_msi_irq_domain(struct irq_domain *parent,
 #endif
 
 #ifdef CONFIG_DMAR_TABLE
+/*
+ * The Intel IOMMU (ab)uses the high bits of the MSI address to contain the
+ * high bits of the destination APIC ID. This can't be done in the general
+ * case for MSIs as it would be targeting real memory above 4GiB not the
+ * APIC.
+ */
+static void dmar_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
+{
+	__irq_msi_compose_msg(irqd_cfg(data), msg, true);
+}
+
 static void dmar_msi_write_msg(struct irq_data *data, struct msi_msg *msg)
 {
 	dmar_msi_write(data->irq, msg);
@@ -288,6 +308,7 @@ static struct irq_chip dmar_msi_controller = {
 	.irq_ack		= irq_chip_ack_parent,
 	.irq_set_affinity	= msi_domain_set_affinity,
 	.irq_retrigger		= irq_chip_retrigger_hierarchy,
+	.irq_compose_msi_msg	= dmar_msi_compose_msg,
 	.irq_write_msi_msg	= dmar_msi_write_msg,
 	.flags			= IRQCHIP_SKIP_SET_WAKE,
 };
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 03/35] x86/apic/uv: Fix inconsistent destination mode
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
  2020-10-24 21:35                     ` [PATCH v3 01/35] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
  2020-10-24 21:35                     ` [PATCH v3 02/35] x86/msi: Only use high bits of MSI address for DMAR unit David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
  2020-10-24 21:35                     ` [PATCH v3 04/35] x86/devicetree: Fix the ioapic interrupt type table David Woodhouse
                                       ` (32 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: Thomas Gleixner <tglx@linutronix.de>

The UV x2apic is strictly using physical destination mode, but
apic::dest_logical is initialized with APIC_DEST_LOGICAL.

This does not matter much because UV does not use any of the generic
functions which use apic::dest_logical, but is still inconsistent.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kernel/apic/x2apic_uv_x.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c
index 714233cee0b5..9ade9e6a95ff 100644
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -811,7 +811,7 @@ static struct apic apic_x2apic_uv_x __ro_after_init = {
 	.irq_dest_mode			= 0, /* Physical */
 
 	.disable_esr			= 0,
-	.dest_logical			= APIC_DEST_LOGICAL,
+	.dest_logical			= APIC_DEST_PHYSICAL,
 	.check_apicid_used		= NULL,
 
 	.init_apic_ldr			= uv_init_apic_ldr,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 04/35] x86/devicetree: Fix the ioapic interrupt type table
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (2 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 03/35] x86/apic/uv: Fix inconsistent destination mode David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
  2020-10-24 21:35                     ` [PATCH v3 05/35] x86/apic: Cleanup delivery mode defines David Woodhouse
                                       ` (31 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: Thomas Gleixner <tglx@linutronix.de>

The ioapic interrupt type table is wrong as it assumes that polarity in
IO/APIC context means active high when set. But the IO/APIC polarity is
working the other way round. This works because the ordering of the entries
is consistent with the device tree and the type information is not used by
the IO/APIC interrupt chip.

The whole trigger and polarity business of IO/APIC is misleading and the
corresponding constants which are defined as 0/1 are not used consistently
and are going to be removed.

Rename the type table members to 'is_level' and 'active_low' and adjust the
type information for consistency sake.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kernel/devicetree.c | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/devicetree.c b/arch/x86/kernel/devicetree.c
index ddffd80f5c52..6a4cb71c2498 100644
--- a/arch/x86/kernel/devicetree.c
+++ b/arch/x86/kernel/devicetree.c
@@ -184,31 +184,31 @@ static unsigned int ioapic_id;
 
 struct of_ioapic_type {
 	u32 out_type;
-	u32 trigger;
-	u32 polarity;
+	u32 is_level;
+	u32 active_low;
 };
 
 static struct of_ioapic_type of_ioapic_type[] =
 {
 	{
-		.out_type	= IRQ_TYPE_EDGE_RISING,
-		.trigger	= IOAPIC_EDGE,
-		.polarity	= 1,
+		.out_type	= IRQ_TYPE_EDGE_FALLING,
+		.is_level	= 0,
+		.active_low	= 1,
 	},
 	{
-		.out_type	= IRQ_TYPE_LEVEL_LOW,
-		.trigger	= IOAPIC_LEVEL,
-		.polarity	= 0,
+		.out_type	= IRQ_TYPE_LEVEL_HIGH,
+		.is_level	= 1,
+		.active_low	= 0,
 	},
 	{
-		.out_type	= IRQ_TYPE_LEVEL_HIGH,
-		.trigger	= IOAPIC_LEVEL,
-		.polarity	= 1,
+		.out_type	= IRQ_TYPE_LEVEL_LOW,
+		.is_level	= 1,
+		.active_low	= 1,
 	},
 	{
-		.out_type	= IRQ_TYPE_EDGE_FALLING,
-		.trigger	= IOAPIC_EDGE,
-		.polarity	= 0,
+		.out_type	= IRQ_TYPE_EDGE_RISING,
+		.is_level	= 0,
+		.active_low	= 0,
 	},
 };
 
@@ -228,7 +228,7 @@ static int dt_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
 		return -EINVAL;
 
 	it = &of_ioapic_type[type_index];
-	ioapic_set_alloc_attr(&tmp, NUMA_NO_NODE, it->trigger, it->polarity);
+	ioapic_set_alloc_attr(&tmp, NUMA_NO_NODE, it->is_level, it->active_low);
 	tmp.devid = mpc_ioapic_id(mp_irqdomain_ioapic_idx(domain));
 	tmp.ioapic.pin = fwspec->param[0];
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 05/35] x86/apic: Cleanup delivery mode defines
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (3 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 04/35] x86/devicetree: Fix the ioapic interrupt type table David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
  2020-10-24 21:35                     ` [PATCH v3 06/35] x86/apic: Replace pointless apic::dest_logical usage David Woodhouse
                                       ` (30 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: Thomas Gleixner <tglx@linutronix.de>

The enum ioapic_irq_destination_types and the enumerated constants starting
with 'dest_' are gross misnomers because they describe the delivery mode.

Rename then enum and the constants so they actually make sense.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/apic.h           |  3 ++-
 arch/x86/include/asm/apicdef.h        | 16 +++++++---------
 arch/x86/kernel/apic/apic_flat_64.c   |  4 ++--
 arch/x86/kernel/apic/apic_noop.c      |  2 +-
 arch/x86/kernel/apic/apic_numachip.c  |  4 ++--
 arch/x86/kernel/apic/bigsmp_32.c      |  2 +-
 arch/x86/kernel/apic/io_apic.c        | 11 ++++++-----
 arch/x86/kernel/apic/probe_32.c       |  2 +-
 arch/x86/kernel/apic/x2apic_cluster.c |  2 +-
 arch/x86/kernel/apic/x2apic_phys.c    |  2 +-
 arch/x86/kernel/apic/x2apic_uv_x.c    |  6 +++---
 arch/x86/platform/uv/uv_irq.c         |  2 +-
 drivers/iommu/amd/iommu.c             |  4 ++--
 drivers/iommu/intel/irq_remapping.c   |  2 +-
 drivers/pci/controller/pci-hyperv.c   |  6 +++---
 15 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index b0fd204e0023..53bb62e0fdd5 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -309,7 +309,8 @@ struct apic {
 	/* dest_logical is used by the IPI functions */
 	u32	dest_logical;
 	u32	disable_esr;
-	u32	irq_delivery_mode;
+
+	enum apic_delivery_modes delivery_mode;
 	u32	irq_dest_mode;
 
 	u32	(*calc_dest_apicid)(unsigned int cpu);
diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
index 05e694ed8386..5716f22f81ac 100644
--- a/arch/x86/include/asm/apicdef.h
+++ b/arch/x86/include/asm/apicdef.h
@@ -432,15 +432,13 @@ struct local_apic {
  #define BAD_APICID 0xFFFFu
 #endif
 
-enum ioapic_irq_destination_types {
-	dest_Fixed		= 0,
-	dest_LowestPrio		= 1,
-	dest_SMI		= 2,
-	dest__reserved_1	= 3,
-	dest_NMI		= 4,
-	dest_INIT		= 5,
-	dest__reserved_2	= 6,
-	dest_ExtINT		= 7
+enum apic_delivery_modes {
+	APIC_DELIVERY_MODE_FIXED	= 0,
+	APIC_DELIVERY_MODE_LOWESTPRIO   = 1,
+	APIC_DELIVERY_MODE_SMI		= 2,
+	APIC_DELIVERY_MODE_NMI		= 4,
+	APIC_DELIVERY_MODE_INIT		= 5,
+	APIC_DELIVERY_MODE_EXTINT	= 7,
 };
 
 #endif /* _ASM_X86_APICDEF_H */
diff --git a/arch/x86/kernel/apic/apic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c
index 7862b152a052..fdd38a17f835 100644
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -113,7 +113,7 @@ static struct apic apic_flat __ro_after_init = {
 	.apic_id_valid			= default_apic_id_valid,
 	.apic_id_registered		= flat_apic_id_registered,
 
-	.irq_delivery_mode		= dest_Fixed,
+	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
 	.irq_dest_mode			= 1, /* logical */
 
 	.disable_esr			= 0,
@@ -206,7 +206,7 @@ static struct apic apic_physflat __ro_after_init = {
 	.apic_id_valid			= default_apic_id_valid,
 	.apic_id_registered		= flat_apic_id_registered,
 
-	.irq_delivery_mode		= dest_Fixed,
+	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
 	.irq_dest_mode			= 0, /* physical */
 
 	.disable_esr			= 0,
diff --git a/arch/x86/kernel/apic/apic_noop.c b/arch/x86/kernel/apic/apic_noop.c
index 780c702969b7..4fc934b11851 100644
--- a/arch/x86/kernel/apic/apic_noop.c
+++ b/arch/x86/kernel/apic/apic_noop.c
@@ -95,7 +95,7 @@ struct apic apic_noop __ro_after_init = {
 	.apic_id_valid			= default_apic_id_valid,
 	.apic_id_registered		= noop_apic_id_registered,
 
-	.irq_delivery_mode		= dest_Fixed,
+	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
 	/* logical delivery broadcast to all CPUs: */
 	.irq_dest_mode			= 1,
 
diff --git a/arch/x86/kernel/apic/apic_numachip.c b/arch/x86/kernel/apic/apic_numachip.c
index 35edd57f064a..db715d082ec9 100644
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -246,7 +246,7 @@ static const struct apic apic_numachip1 __refconst = {
 	.apic_id_valid			= numachip_apic_id_valid,
 	.apic_id_registered		= numachip_apic_id_registered,
 
-	.irq_delivery_mode		= dest_Fixed,
+	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
 	.irq_dest_mode			= 0, /* physical */
 
 	.disable_esr			= 0,
@@ -295,7 +295,7 @@ static const struct apic apic_numachip2 __refconst = {
 	.apic_id_valid			= numachip_apic_id_valid,
 	.apic_id_registered		= numachip_apic_id_registered,
 
-	.irq_delivery_mode		= dest_Fixed,
+	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
 	.irq_dest_mode			= 0, /* physical */
 
 	.disable_esr			= 0,
diff --git a/arch/x86/kernel/apic/bigsmp_32.c b/arch/x86/kernel/apic/bigsmp_32.c
index 98d015a4405a..7f6461f5d349 100644
--- a/arch/x86/kernel/apic/bigsmp_32.c
+++ b/arch/x86/kernel/apic/bigsmp_32.c
@@ -127,7 +127,7 @@ static struct apic apic_bigsmp __ro_after_init = {
 	.apic_id_valid			= default_apic_id_valid,
 	.apic_id_registered		= bigsmp_apic_id_registered,
 
-	.irq_delivery_mode		= dest_Fixed,
+	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
 	/* phys delivery to target CPU: */
 	.irq_dest_mode			= 0,
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 7b3c7e0d4a09..cff6cbc3d183 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -535,7 +535,7 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin)
 
 	/* Check delivery_mode to be sure we're not clearing an SMI pin */
 	entry = ioapic_read_entry(apic, pin);
-	if (entry.delivery_mode == dest_SMI)
+	if (entry.delivery_mode == APIC_DELIVERY_MODE_SMI)
 		return;
 
 	/*
@@ -1368,7 +1368,8 @@ void __init enable_IO_APIC(void)
 		/* If the interrupt line is enabled and in ExtInt mode
 		 * I have found the pin where the i8259 is connected.
 		 */
-		if ((entry.mask == 0) && (entry.delivery_mode == dest_ExtINT)) {
+		if ((entry.mask == 0) &&
+		    (entry.delivery_mode == APIC_DELIVERY_MODE_EXTINT)) {
 			ioapic_i8259.apic = apic;
 			ioapic_i8259.pin  = pin;
 			goto found_i8259;
@@ -1416,7 +1417,7 @@ void native_restore_boot_irq_mode(void)
 		entry.trigger		= IOAPIC_EDGE;
 		entry.polarity		= IOAPIC_POL_HIGH;
 		entry.dest_mode		= IOAPIC_DEST_MODE_PHYSICAL;
-		entry.delivery_mode	= dest_ExtINT;
+		entry.delivery_mode	= APIC_DELIVERY_MODE_EXTINT;
 		entry.dest		= read_apic_id();
 
 		/*
@@ -2047,7 +2048,7 @@ static inline void __init unlock_ExtINT_logic(void)
 	entry1.dest_mode = IOAPIC_DEST_MODE_PHYSICAL;
 	entry1.mask = IOAPIC_UNMASKED;
 	entry1.dest = hard_smp_processor_id();
-	entry1.delivery_mode = dest_ExtINT;
+	entry1.delivery_mode = APIC_DELIVERY_MODE_EXTINT;
 	entry1.polarity = entry0.polarity;
 	entry1.trigger = IOAPIC_EDGE;
 	entry1.vector = 0;
@@ -2948,7 +2949,7 @@ static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
 			   struct IO_APIC_route_entry *entry)
 {
 	memset(entry, 0, sizeof(*entry));
-	entry->delivery_mode = apic->irq_delivery_mode;
+	entry->delivery_mode = apic->delivery_mode;
 	entry->dest_mode     = apic->irq_dest_mode;
 	entry->dest	     = cfg->dest_apicid;
 	entry->vector	     = cfg->vector;
diff --git a/arch/x86/kernel/apic/probe_32.c b/arch/x86/kernel/apic/probe_32.c
index 67b6f7c049ec..77c6e2e04a1f 100644
--- a/arch/x86/kernel/apic/probe_32.c
+++ b/arch/x86/kernel/apic/probe_32.c
@@ -69,7 +69,7 @@ static struct apic apic_default __ro_after_init = {
 	.apic_id_valid			= default_apic_id_valid,
 	.apic_id_registered		= default_apic_id_registered,
 
-	.irq_delivery_mode		= dest_Fixed,
+	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
 	/* logical delivery broadcast to all CPUs: */
 	.irq_dest_mode			= 1,
 
diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
index b0889c48a2ac..82fb43d807f7 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -184,7 +184,7 @@ static struct apic apic_x2apic_cluster __ro_after_init = {
 	.apic_id_valid			= x2apic_apic_id_valid,
 	.apic_id_registered		= x2apic_apic_id_registered,
 
-	.irq_delivery_mode		= dest_Fixed,
+	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
 	.irq_dest_mode			= 1, /* logical */
 
 	.disable_esr			= 0,
diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c
index e14eae6d6ea7..437e8439db67 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -157,7 +157,7 @@ static struct apic apic_x2apic_phys __ro_after_init = {
 	.apic_id_valid			= x2apic_apic_id_valid,
 	.apic_id_registered		= x2apic_apic_id_registered,
 
-	.irq_delivery_mode		= dest_Fixed,
+	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
 	.irq_dest_mode			= 0, /* physical */
 
 	.disable_esr			= 0,
diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c
index 9ade9e6a95ff..49deefdded68 100644
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -703,9 +703,9 @@ static void uv_send_IPI_one(int cpu, int vector)
 	unsigned long dmode, val;
 
 	if (vector == NMI_VECTOR)
-		dmode = dest_NMI;
+		dmode = APIC_DELIVERY_MODE_NMI;
 	else
-		dmode = dest_Fixed;
+		dmode = APIC_DELIVERY_MODE_FIXED;
 
 	val = (1UL << UVH_IPI_INT_SEND_SHFT) |
 		(apicid << UVH_IPI_INT_APIC_ID_SHFT) |
@@ -807,7 +807,7 @@ static struct apic apic_x2apic_uv_x __ro_after_init = {
 	.apic_id_valid			= uv_apic_id_valid,
 	.apic_id_registered		= uv_apic_id_registered,
 
-	.irq_delivery_mode		= dest_Fixed,
+	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
 	.irq_dest_mode			= 0, /* Physical */
 
 	.disable_esr			= 0,
diff --git a/arch/x86/platform/uv/uv_irq.c b/arch/x86/platform/uv/uv_irq.c
index 18ca2261cc9a..e7020d162949 100644
--- a/arch/x86/platform/uv/uv_irq.c
+++ b/arch/x86/platform/uv/uv_irq.c
@@ -35,7 +35,7 @@ static void uv_program_mmr(struct irq_cfg *cfg, struct uv_irq_2_mmr_pnode *info)
 	mmr_value = 0;
 	entry = (struct uv_IO_APIC_route_entry *)&mmr_value;
 	entry->vector		= cfg->vector;
-	entry->delivery_mode	= apic->irq_delivery_mode;
+	entry->delivery_mode	= apic->delivery_mode;
 	entry->dest_mode	= apic->irq_dest_mode;
 	entry->polarity		= 0;
 	entry->trigger		= 0;
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index b9cf59443843..bc81b91f89fe 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3671,7 +3671,7 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 
 	data->irq_2_irte.devid = devid;
 	data->irq_2_irte.index = index + sub_handle;
-	iommu->irte_ops->prepare(data->entry, apic->irq_delivery_mode,
+	iommu->irte_ops->prepare(data->entry, apic->delivery_mode,
 				 apic->irq_dest_mode, irq_cfg->vector,
 				 irq_cfg->dest_apicid, devid);
 
@@ -3944,7 +3944,7 @@ int amd_iommu_deactivate_guest_mode(void *data)
 
 	entry->lo.fields_remap.valid       = valid;
 	entry->lo.fields_remap.dm          = apic->irq_dest_mode;
-	entry->lo.fields_remap.int_type    = apic->irq_delivery_mode;
+	entry->lo.fields_remap.int_type    = apic->delivery_mode;
 	entry->hi.fields.vector            = cfg->vector;
 	entry->lo.fields_remap.destination =
 				APICID_TO_IRTE_DEST_LO(cfg->dest_apicid);
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 0cfce1d3b7bb..d44e719d1984 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1122,7 +1122,7 @@ static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
 	 * irq migration in the presence of interrupt-remapping.
 	*/
 	irte->trigger_mode = 0;
-	irte->dlvry_mode = apic->irq_delivery_mode;
+	irte->dlvry_mode = apic->delivery_mode;
 	irte->vector = vector;
 	irte->dest_id = IRTE_DEST(dest);
 	irte->redir_hint = 1;
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 4e992403fffe..92b57ac50a06 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -1226,7 +1226,7 @@ static void hv_irq_unmask(struct irq_data *data)
 	params->int_target.vector = cfg->vector;
 
 	/*
-	 * Honoring apic->irq_delivery_mode set to dest_Fixed by
+	 * Honoring apic->delivery_mode set to APIC_DELIVERY_MODE_FIXED by
 	 * setting the HV_DEVICE_INTERRUPT_TARGET_MULTICAST flag results in a
 	 * spurious interrupt storm. Not doing so does not seem to have a
 	 * negative effect (yet?).
@@ -1310,7 +1310,7 @@ static u32 hv_compose_msi_req_v1(
 	int_pkt->wslot.slot = slot;
 	int_pkt->int_desc.vector = vector;
 	int_pkt->int_desc.vector_count = 1;
-	int_pkt->int_desc.delivery_mode = dest_Fixed;
+	int_pkt->int_desc.delivery_mode = APIC_DELIVERY_MODE_FIXED;
 
 	/*
 	 * Create MSI w/ dummy vCPU set, overwritten by subsequent retarget in
@@ -1331,7 +1331,7 @@ static u32 hv_compose_msi_req_v2(
 	int_pkt->wslot.slot = slot;
 	int_pkt->int_desc.vector = vector;
 	int_pkt->int_desc.vector_count = 1;
-	int_pkt->int_desc.delivery_mode = dest_Fixed;
+	int_pkt->int_desc.delivery_mode = APIC_DELIVERY_MODE_FIXED;
 
 	/*
 	 * Create MSI w/ dummy vCPU set targeting just one vCPU, overwritten
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 06/35] x86/apic: Replace pointless apic::dest_logical usage
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (4 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 05/35] x86/apic: Cleanup delivery mode defines David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] x86/apic: Replace pointless apic:: Dest_logical usage tip-bot2 for Thomas Gleixner
  2020-10-24 21:35                     ` [PATCH v3 07/35] x86/apic: Get rid of apic::dest_logical David Woodhouse
                                       ` (29 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: Thomas Gleixner <tglx@linutronix.de>

All these functions are only used for logical destination mode. So reading
the destination mode mask from the apic structure is a pointless
exercise. Just hand in the proper constant: APIC_DEST_LOGICAL.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kernel/apic/apic_flat_64.c   | 2 +-
 arch/x86/kernel/apic/ipi.c            | 6 +++---
 arch/x86/kernel/apic/x2apic_cluster.c | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/apic/apic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c
index fdd38a17f835..6df837fd5081 100644
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -53,7 +53,7 @@ static void _flat_send_IPI_mask(unsigned long mask, int vector)
 	unsigned long flags;
 
 	local_irq_save(flags);
-	__default_send_IPI_dest_field(mask, vector, apic->dest_logical);
+	__default_send_IPI_dest_field(mask, vector, APIC_DEST_LOGICAL);
 	local_irq_restore(flags);
 }
 
diff --git a/arch/x86/kernel/apic/ipi.c b/arch/x86/kernel/apic/ipi.c
index 387154e39e08..d1fb874fbe64 100644
--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -260,7 +260,7 @@ void default_send_IPI_mask_sequence_logical(const struct cpumask *mask,
 	for_each_cpu(query_cpu, mask)
 		__default_send_IPI_dest_field(
 			early_per_cpu(x86_cpu_to_logical_apicid, query_cpu),
-			vector, apic->dest_logical);
+			vector, APIC_DEST_LOGICAL);
 	local_irq_restore(flags);
 }
 
@@ -279,7 +279,7 @@ void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask,
 			continue;
 		__default_send_IPI_dest_field(
 			early_per_cpu(x86_cpu_to_logical_apicid, query_cpu),
-			vector, apic->dest_logical);
+			vector, APIC_DEST_LOGICAL);
 		}
 	local_irq_restore(flags);
 }
@@ -297,7 +297,7 @@ void default_send_IPI_mask_logical(const struct cpumask *cpumask, int vector)
 
 	local_irq_save(flags);
 	WARN_ON(mask & ~cpumask_bits(cpu_online_mask)[0]);
-	__default_send_IPI_dest_field(mask, vector, apic->dest_logical);
+	__default_send_IPI_dest_field(mask, vector, APIC_DEST_LOGICAL);
 	local_irq_restore(flags);
 }
 
diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
index 82fb43d807f7..f77e9fb7aac1 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -61,7 +61,7 @@ __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest)
 		if (!dest)
 			continue;
 
-		__x2apic_send_IPI_dest(dest, vector, apic->dest_logical);
+		__x2apic_send_IPI_dest(dest, vector, APIC_DEST_LOGICAL);
 		/* Remove cluster CPUs from tmpmask */
 		cpumask_andnot(tmpmsk, tmpmsk, &cmsk->mask);
 	}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 07/35] x86/apic: Get rid of apic::dest_logical
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (5 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 06/35] x86/apic: Replace pointless apic::dest_logical usage David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] x86/apic: Get rid of apic:: Dest_logical tip-bot2 for Thomas Gleixner
  2020-10-24 21:35                     ` [PATCH v3 08/35] x86/apic: Cleanup destination mode David Woodhouse
                                       ` (28 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: Thomas Gleixner <tglx@linutronix.de>

struct apic has two members which store information about the destination
mode: dest_logical and irq_dest_mode.

dest_logical contains a mask which was historically used to set the
destination mode in IPI messages. Over time the usage was reduced and the
logical/physical functions were seperated.

There are only a few places which still use 'dest_logical' but they can
ues 'irq_dest_mode' instead.

irq_dest_mode is actually a boolean where 0 means physical destination mode
and 1 means logical destination mode. Of course the name does not reflect
the functionality. This will be cleaned up in a subsequent change.

Remove apic::dest_logical and fixup the remaining users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/apic.h           | 2 --
 arch/x86/kernel/apic/apic.c           | 2 +-
 arch/x86/kernel/apic/apic_flat_64.c   | 8 ++------
 arch/x86/kernel/apic/apic_noop.c      | 4 +---
 arch/x86/kernel/apic/apic_numachip.c  | 8 ++------
 arch/x86/kernel/apic/bigsmp_32.c      | 4 +---
 arch/x86/kernel/apic/probe_32.c       | 4 +---
 arch/x86/kernel/apic/x2apic_cluster.c | 4 +---
 arch/x86/kernel/apic/x2apic_phys.c    | 4 +---
 arch/x86/kernel/apic/x2apic_uv_x.c    | 4 +---
 arch/x86/kernel/smpboot.c             | 5 +++--
 arch/x86/xen/apic.c                   | 4 +---
 12 files changed, 15 insertions(+), 38 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 53bb62e0fdd5..019d7ac3b16e 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -306,8 +306,6 @@ struct apic {
 	void	(*send_IPI_all)(int vector);
 	void	(*send_IPI_self)(int vector);
 
-	/* dest_logical is used by the IPI functions */
-	u32	dest_logical;
 	u32	disable_esr;
 
 	enum apic_delivery_modes delivery_mode;
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 113f6ca7b828..29d28b34cb2f 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1591,7 +1591,7 @@ static void setup_local_APIC(void)
 	apic->init_apic_ldr();
 
 #ifdef CONFIG_X86_32
-	if (apic->dest_logical) {
+	if (apic->irq_dest_mode == 1) {
 		int logical_apicid, ldr_apicid;
 
 		/*
diff --git a/arch/x86/kernel/apic/apic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c
index 6df837fd5081..bbb1b89fe711 100644
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -117,11 +117,9 @@ static struct apic apic_flat __ro_after_init = {
 	.irq_dest_mode			= 1, /* logical */
 
 	.disable_esr			= 0,
-	.dest_logical			= APIC_DEST_LOGICAL,
-	.check_apicid_used		= NULL,
 
+	.check_apicid_used		= NULL,
 	.init_apic_ldr			= flat_init_apic_ldr,
-
 	.ioapic_phys_id_map		= NULL,
 	.setup_apic_routing		= NULL,
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
@@ -210,11 +208,9 @@ static struct apic apic_physflat __ro_after_init = {
 	.irq_dest_mode			= 0, /* physical */
 
 	.disable_esr			= 0,
-	.dest_logical			= 0,
-	.check_apicid_used		= NULL,
 
+	.check_apicid_used		= NULL,
 	.init_apic_ldr			= physflat_init_apic_ldr,
-
 	.ioapic_phys_id_map		= NULL,
 	.setup_apic_routing		= NULL,
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
diff --git a/arch/x86/kernel/apic/apic_noop.c b/arch/x86/kernel/apic/apic_noop.c
index 4fc934b11851..38f167ce5031 100644
--- a/arch/x86/kernel/apic/apic_noop.c
+++ b/arch/x86/kernel/apic/apic_noop.c
@@ -100,11 +100,9 @@ struct apic apic_noop __ro_after_init = {
 	.irq_dest_mode			= 1,
 
 	.disable_esr			= 0,
-	.dest_logical			= APIC_DEST_LOGICAL,
-	.check_apicid_used		= default_check_apicid_used,
 
+	.check_apicid_used		= default_check_apicid_used,
 	.init_apic_ldr			= noop_init_apic_ldr,
-
 	.ioapic_phys_id_map		= default_ioapic_phys_id_map,
 	.setup_apic_routing		= NULL,
 
diff --git a/arch/x86/kernel/apic/apic_numachip.c b/arch/x86/kernel/apic/apic_numachip.c
index db715d082ec9..4ebf9fe2c95d 100644
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -250,11 +250,9 @@ static const struct apic apic_numachip1 __refconst = {
 	.irq_dest_mode			= 0, /* physical */
 
 	.disable_esr			= 0,
-	.dest_logical			= 0,
-	.check_apicid_used		= NULL,
 
+	.check_apicid_used		= NULL,
 	.init_apic_ldr			= flat_init_apic_ldr,
-
 	.ioapic_phys_id_map		= NULL,
 	.setup_apic_routing		= NULL,
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
@@ -299,11 +297,9 @@ static const struct apic apic_numachip2 __refconst = {
 	.irq_dest_mode			= 0, /* physical */
 
 	.disable_esr			= 0,
-	.dest_logical			= 0,
-	.check_apicid_used		= NULL,
 
+	.check_apicid_used		= NULL,
 	.init_apic_ldr			= flat_init_apic_ldr,
-
 	.ioapic_phys_id_map		= NULL,
 	.setup_apic_routing		= NULL,
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
diff --git a/arch/x86/kernel/apic/bigsmp_32.c b/arch/x86/kernel/apic/bigsmp_32.c
index 7f6461f5d349..64c375b8c54e 100644
--- a/arch/x86/kernel/apic/bigsmp_32.c
+++ b/arch/x86/kernel/apic/bigsmp_32.c
@@ -132,11 +132,9 @@ static struct apic apic_bigsmp __ro_after_init = {
 	.irq_dest_mode			= 0,
 
 	.disable_esr			= 1,
-	.dest_logical			= 0,
-	.check_apicid_used		= bigsmp_check_apicid_used,
 
+	.check_apicid_used		= bigsmp_check_apicid_used,
 	.init_apic_ldr			= bigsmp_init_apic_ldr,
-
 	.ioapic_phys_id_map		= bigsmp_ioapic_phys_id_map,
 	.setup_apic_routing		= bigsmp_setup_apic_routing,
 	.cpu_present_to_apicid		= bigsmp_cpu_present_to_apicid,
diff --git a/arch/x86/kernel/apic/probe_32.c b/arch/x86/kernel/apic/probe_32.c
index 77c6e2e04a1f..97652aacf3e1 100644
--- a/arch/x86/kernel/apic/probe_32.c
+++ b/arch/x86/kernel/apic/probe_32.c
@@ -74,11 +74,9 @@ static struct apic apic_default __ro_after_init = {
 	.irq_dest_mode			= 1,
 
 	.disable_esr			= 0,
-	.dest_logical			= APIC_DEST_LOGICAL,
-	.check_apicid_used		= default_check_apicid_used,
 
+	.check_apicid_used		= default_check_apicid_used,
 	.init_apic_ldr			= default_init_apic_ldr,
-
 	.ioapic_phys_id_map		= default_ioapic_phys_id_map,
 	.setup_apic_routing		= setup_apic_flat_routing,
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
index f77e9fb7aac1..53390fc9f51e 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -188,11 +188,9 @@ static struct apic apic_x2apic_cluster __ro_after_init = {
 	.irq_dest_mode			= 1, /* logical */
 
 	.disable_esr			= 0,
-	.dest_logical			= APIC_DEST_LOGICAL,
-	.check_apicid_used		= NULL,
 
+	.check_apicid_used		= NULL,
 	.init_apic_ldr			= init_x2apic_ldr,
-
 	.ioapic_phys_id_map		= NULL,
 	.setup_apic_routing		= NULL,
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c
index 437e8439db67..ee0c4d08092c 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -161,11 +161,9 @@ static struct apic apic_x2apic_phys __ro_after_init = {
 	.irq_dest_mode			= 0, /* physical */
 
 	.disable_esr			= 0,
-	.dest_logical			= 0,
-	.check_apicid_used		= NULL,
 
+	.check_apicid_used		= NULL,
 	.init_apic_ldr			= init_x2apic_ldr,
-
 	.ioapic_phys_id_map		= NULL,
 	.setup_apic_routing		= NULL,
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c
index 49deefdded68..d21a6853afee 100644
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -811,11 +811,9 @@ static struct apic apic_x2apic_uv_x __ro_after_init = {
 	.irq_dest_mode			= 0, /* Physical */
 
 	.disable_esr			= 0,
-	.dest_logical			= APIC_DEST_PHYSICAL,
-	.check_apicid_used		= NULL,
 
+	.check_apicid_used		= NULL,
 	.init_apic_ldr			= uv_init_apic_ldr,
-
 	.ioapic_phys_id_map		= NULL,
 	.setup_apic_routing		= NULL,
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index de776b2e6046..6c14f1091f60 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -747,13 +747,14 @@ static void __init smp_quirk_init_udelay(void)
 int
 wakeup_secondary_cpu_via_nmi(int apicid, unsigned long start_eip)
 {
+	u32 dm = apic->irq_dest_mode ? APIC_DEST_LOGICAL : APIC_DEST_PHYSICAL;
 	unsigned long send_status, accept_status = 0;
 	int maxlvt;
 
 	/* Target chip */
 	/* Boot on the stack */
 	/* Kick the second */
-	apic_icr_write(APIC_DM_NMI | apic->dest_logical, apicid);
+	apic_icr_write(APIC_DM_NMI | dm, apicid);
 
 	pr_debug("Waiting for send to finish...\n");
 	send_status = safe_apic_wait_icr_idle();
@@ -980,7 +981,7 @@ wakeup_cpu_via_init_nmi(int cpu, unsigned long start_ip, int apicid,
 	if (!boot_error) {
 		enable_start_cpu0 = 1;
 		*cpu0_nmi_registered = 1;
-		if (apic->dest_logical == APIC_DEST_LOGICAL)
+		if (apic->irq_dest_mode)
 			id = cpu0_logical_apicid;
 		else
 			id = apicid;
diff --git a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c
index e82fd1910dae..c35c24b5bc01 100644
--- a/arch/x86/xen/apic.c
+++ b/arch/x86/xen/apic.c
@@ -152,11 +152,9 @@ static struct apic xen_pv_apic = {
 	/* .irq_dest_mode     - used in native_compose_msi_msg only */
 
 	.disable_esr			= 0,
-	/* .dest_logical      -  default_send_IPI_ use it but we use our own. */
-	.check_apicid_used		= default_check_apicid_used, /* Used on 32-bit */
 
+	.check_apicid_used		= default_check_apicid_used, /* Used on 32-bit */
 	.init_apic_ldr			= xen_noop, /* setup_local_APIC calls it */
-
 	.ioapic_phys_id_map		= default_ioapic_phys_id_map, /* Used on 32-bit */
 	.setup_apic_routing		= NULL,
 	.cpu_present_to_apicid		= xen_cpu_present_to_apicid,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 08/35] x86/apic: Cleanup destination mode
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (6 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 07/35] x86/apic: Get rid of apic::dest_logical David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
  2020-10-24 21:35                     ` [PATCH v3 09/35] x86/apic: Always provide irq_compose_msi_msg() method for vector domain David Woodhouse
                                       ` (27 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: Thomas Gleixner <tglx@linutronix.de>

apic::irq_dest_mode is actually a boolean, but defined as u32 and named in
a way which does not explain what it means.

Make it a boolean and rename it to 'dest_mode_logical'

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/apic.h           | 2 +-
 arch/x86/kernel/apic/apic.c           | 2 +-
 arch/x86/kernel/apic/apic_flat_64.c   | 4 ++--
 arch/x86/kernel/apic/apic_noop.c      | 4 +---
 arch/x86/kernel/apic/apic_numachip.c  | 4 ++--
 arch/x86/kernel/apic/bigsmp_32.c      | 3 +--
 arch/x86/kernel/apic/io_apic.c        | 2 +-
 arch/x86/kernel/apic/msi.c            | 6 +++---
 arch/x86/kernel/apic/probe_32.c       | 3 +--
 arch/x86/kernel/apic/x2apic_cluster.c | 2 +-
 arch/x86/kernel/apic/x2apic_phys.c    | 2 +-
 arch/x86/kernel/apic/x2apic_uv_x.c    | 2 +-
 arch/x86/kernel/smpboot.c             | 7 ++-----
 arch/x86/platform/uv/uv_irq.c         | 2 +-
 arch/x86/xen/apic.c                   | 3 +--
 drivers/iommu/amd/amd_iommu_types.h   | 2 +-
 drivers/iommu/amd/iommu.c             | 8 ++++----
 drivers/iommu/intel/irq_remapping.c   | 2 +-
 18 files changed, 26 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 019d7ac3b16e..652f62252349 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -309,7 +309,7 @@ struct apic {
 	u32	disable_esr;
 
 	enum apic_delivery_modes delivery_mode;
-	u32	irq_dest_mode;
+	bool	dest_mode_logical;
 
 	u32	(*calc_dest_apicid)(unsigned int cpu);
 
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 29d28b34cb2f..54f04355aaa2 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1591,7 +1591,7 @@ static void setup_local_APIC(void)
 	apic->init_apic_ldr();
 
 #ifdef CONFIG_X86_32
-	if (apic->irq_dest_mode == 1) {
+	if (apic->dest_mode_logical) {
 		int logical_apicid, ldr_apicid;
 
 		/*
diff --git a/arch/x86/kernel/apic/apic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c
index bbb1b89fe711..8f72b4351c9f 100644
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -114,7 +114,7 @@ static struct apic apic_flat __ro_after_init = {
 	.apic_id_registered		= flat_apic_id_registered,
 
 	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
-	.irq_dest_mode			= 1, /* logical */
+	.dest_mode_logical		= true,
 
 	.disable_esr			= 0,
 
@@ -205,7 +205,7 @@ static struct apic apic_physflat __ro_after_init = {
 	.apic_id_registered		= flat_apic_id_registered,
 
 	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
-	.irq_dest_mode			= 0, /* physical */
+	.dest_mode_logical		= false,
 
 	.disable_esr			= 0,
 
diff --git a/arch/x86/kernel/apic/apic_noop.c b/arch/x86/kernel/apic/apic_noop.c
index 38f167ce5031..fe78319e0f7a 100644
--- a/arch/x86/kernel/apic/apic_noop.c
+++ b/arch/x86/kernel/apic/apic_noop.c
@@ -96,8 +96,7 @@ struct apic apic_noop __ro_after_init = {
 	.apic_id_registered		= noop_apic_id_registered,
 
 	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
-	/* logical delivery broadcast to all CPUs: */
-	.irq_dest_mode			= 1,
+	.dest_mode_logical		= true,
 
 	.disable_esr			= 0,
 
@@ -105,7 +104,6 @@ struct apic apic_noop __ro_after_init = {
 	.init_apic_ldr			= noop_init_apic_ldr,
 	.ioapic_phys_id_map		= default_ioapic_phys_id_map,
 	.setup_apic_routing		= NULL,
-
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
 	.apicid_to_cpu_present		= physid_set_mask_of_physid,
 
diff --git a/arch/x86/kernel/apic/apic_numachip.c b/arch/x86/kernel/apic/apic_numachip.c
index 4ebf9fe2c95d..a54d817eb4b6 100644
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -247,7 +247,7 @@ static const struct apic apic_numachip1 __refconst = {
 	.apic_id_registered		= numachip_apic_id_registered,
 
 	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
-	.irq_dest_mode			= 0, /* physical */
+	.dest_mode_logical		= false,
 
 	.disable_esr			= 0,
 
@@ -294,7 +294,7 @@ static const struct apic apic_numachip2 __refconst = {
 	.apic_id_registered		= numachip_apic_id_registered,
 
 	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
-	.irq_dest_mode			= 0, /* physical */
+	.dest_mode_logical		= false,
 
 	.disable_esr			= 0,
 
diff --git a/arch/x86/kernel/apic/bigsmp_32.c b/arch/x86/kernel/apic/bigsmp_32.c
index 64c375b8c54e..77555f66c14d 100644
--- a/arch/x86/kernel/apic/bigsmp_32.c
+++ b/arch/x86/kernel/apic/bigsmp_32.c
@@ -128,8 +128,7 @@ static struct apic apic_bigsmp __ro_after_init = {
 	.apic_id_registered		= bigsmp_apic_id_registered,
 
 	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
-	/* phys delivery to target CPU: */
-	.irq_dest_mode			= 0,
+	.dest_mode_logical		= false,
 
 	.disable_esr			= 1,
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index cff6cbc3d183..c6d92d2570d0 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2950,7 +2950,7 @@ static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
 {
 	memset(entry, 0, sizeof(*entry));
 	entry->delivery_mode = apic->delivery_mode;
-	entry->dest_mode     = apic->irq_dest_mode;
+	entry->dest_mode     = apic->dest_mode_logical;
 	entry->dest	     = cfg->dest_apicid;
 	entry->vector	     = cfg->vector;
 	entry->trigger	     = data->trigger;
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 516df47bde73..46ffd41a4238 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -30,9 +30,9 @@ static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
 
 	msg->address_lo =
 		MSI_ADDR_BASE_LO |
-		((apic->irq_dest_mode == 0) ?
-			MSI_ADDR_DEST_MODE_PHYSICAL :
-			MSI_ADDR_DEST_MODE_LOGICAL) |
+		(apic->dest_mode_logical ?
+			MSI_ADDR_DEST_MODE_LOGICAL :
+			MSI_ADDR_DEST_MODE_PHYSICAL) |
 		MSI_ADDR_REDIRECTION_CPU |
 		MSI_ADDR_DEST_ID(cfg->dest_apicid);
 
diff --git a/arch/x86/kernel/apic/probe_32.c b/arch/x86/kernel/apic/probe_32.c
index 97652aacf3e1..a61f642b1b90 100644
--- a/arch/x86/kernel/apic/probe_32.c
+++ b/arch/x86/kernel/apic/probe_32.c
@@ -70,8 +70,7 @@ static struct apic apic_default __ro_after_init = {
 	.apic_id_registered		= default_apic_id_registered,
 
 	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
-	/* logical delivery broadcast to all CPUs: */
-	.irq_dest_mode			= 1,
+	.dest_mode_logical		= true,
 
 	.disable_esr			= 0,
 
diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
index 53390fc9f51e..df6adc5674c9 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -185,7 +185,7 @@ static struct apic apic_x2apic_cluster __ro_after_init = {
 	.apic_id_registered		= x2apic_apic_id_registered,
 
 	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
-	.irq_dest_mode			= 1, /* logical */
+	.dest_mode_logical		= true,
 
 	.disable_esr			= 0,
 
diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c
index ee0c4d08092c..0e4e81971567 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -158,7 +158,7 @@ static struct apic apic_x2apic_phys __ro_after_init = {
 	.apic_id_registered		= x2apic_apic_id_registered,
 
 	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
-	.irq_dest_mode			= 0, /* physical */
+	.dest_mode_logical		= false,
 
 	.disable_esr			= 0,
 
diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c
index d21a6853afee..de94181f4d0c 100644
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -808,7 +808,7 @@ static struct apic apic_x2apic_uv_x __ro_after_init = {
 	.apic_id_registered		= uv_apic_id_registered,
 
 	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
-	.irq_dest_mode			= 0, /* Physical */
+	.dest_mode_logical		= false,
 
 	.disable_esr			= 0,
 
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 6c14f1091f60..d133d6580f41 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -747,7 +747,7 @@ static void __init smp_quirk_init_udelay(void)
 int
 wakeup_secondary_cpu_via_nmi(int apicid, unsigned long start_eip)
 {
-	u32 dm = apic->irq_dest_mode ? APIC_DEST_LOGICAL : APIC_DEST_PHYSICAL;
+	u32 dm = apic->dest_mode_logical ? APIC_DEST_LOGICAL : APIC_DEST_PHYSICAL;
 	unsigned long send_status, accept_status = 0;
 	int maxlvt;
 
@@ -981,10 +981,7 @@ wakeup_cpu_via_init_nmi(int cpu, unsigned long start_ip, int apicid,
 	if (!boot_error) {
 		enable_start_cpu0 = 1;
 		*cpu0_nmi_registered = 1;
-		if (apic->irq_dest_mode)
-			id = cpu0_logical_apicid;
-		else
-			id = apicid;
+		id = apic->dest_mode_logical ? cpu0_logical_apicid : apicid;
 		boot_error = wakeup_secondary_cpu_via_nmi(id, start_ip);
 	}
 
diff --git a/arch/x86/platform/uv/uv_irq.c b/arch/x86/platform/uv/uv_irq.c
index e7020d162949..1a536a187d74 100644
--- a/arch/x86/platform/uv/uv_irq.c
+++ b/arch/x86/platform/uv/uv_irq.c
@@ -36,7 +36,7 @@ static void uv_program_mmr(struct irq_cfg *cfg, struct uv_irq_2_mmr_pnode *info)
 	entry = (struct uv_IO_APIC_route_entry *)&mmr_value;
 	entry->vector		= cfg->vector;
 	entry->delivery_mode	= apic->delivery_mode;
-	entry->dest_mode	= apic->irq_dest_mode;
+	entry->dest_mode	= apic->dest_mode_logical;
 	entry->polarity		= 0;
 	entry->trigger		= 0;
 	entry->mask		= 0;
diff --git a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c
index c35c24b5bc01..0d46cc283cf5 100644
--- a/arch/x86/xen/apic.c
+++ b/arch/x86/xen/apic.c
@@ -148,8 +148,7 @@ static struct apic xen_pv_apic = {
 	.apic_id_valid 			= xen_id_always_valid,
 	.apic_id_registered 		= xen_id_always_registered,
 
-	/* .irq_delivery_mode - used in native_compose_msi_msg only */
-	/* .irq_dest_mode     - used in native_compose_msi_msg only */
+	/* .delivery_mode and .dest_mode_logical not used by XENPV */
 
 	.disable_esr			= 0,
 
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index f696ac7c5f89..ba74a722a400 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -893,7 +893,7 @@ struct amd_ir_data {
 };
 
 struct amd_irte_ops {
-	void (*prepare)(void *, u32, u32, u8, u32, int);
+	void (*prepare)(void *, u32, bool, u8, u32, int);
 	void (*activate)(void *, u16, u16);
 	void (*deactivate)(void *, u16, u16);
 	void (*set_affinity)(void *, u16, u16, u8, u32);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index bc81b91f89fe..d7f0c8908602 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3466,7 +3466,7 @@ static void free_irte(u16 devid, int index)
 }
 
 static void irte_prepare(void *entry,
-			 u32 delivery_mode, u32 dest_mode,
+			 u32 delivery_mode, bool dest_mode,
 			 u8 vector, u32 dest_apicid, int devid)
 {
 	union irte *irte = (union irte *) entry;
@@ -3480,7 +3480,7 @@ static void irte_prepare(void *entry,
 }
 
 static void irte_ga_prepare(void *entry,
-			    u32 delivery_mode, u32 dest_mode,
+			    u32 delivery_mode, bool dest_mode,
 			    u8 vector, u32 dest_apicid, int devid)
 {
 	struct irte_ga *irte = (struct irte_ga *) entry;
@@ -3672,7 +3672,7 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 	data->irq_2_irte.devid = devid;
 	data->irq_2_irte.index = index + sub_handle;
 	iommu->irte_ops->prepare(data->entry, apic->delivery_mode,
-				 apic->irq_dest_mode, irq_cfg->vector,
+				 apic->dest_mode_logical, irq_cfg->vector,
 				 irq_cfg->dest_apicid, devid);
 
 	switch (info->type) {
@@ -3943,7 +3943,7 @@ int amd_iommu_deactivate_guest_mode(void *data)
 	entry->hi.val = 0;
 
 	entry->lo.fields_remap.valid       = valid;
-	entry->lo.fields_remap.dm          = apic->irq_dest_mode;
+	entry->lo.fields_remap.dm          = apic->dest_mode_logical;
 	entry->lo.fields_remap.int_type    = apic->delivery_mode;
 	entry->hi.fields.vector            = cfg->vector;
 	entry->lo.fields_remap.destination =
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index d44e719d1984..5628d43b795e 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1113,7 +1113,7 @@ static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
 	memset(irte, 0, sizeof(*irte));
 
 	irte->present = 1;
-	irte->dst_mode = apic->irq_dest_mode;
+	irte->dst_mode = apic->dest_mode_logical;
 	/*
 	 * Trigger mode in the IRTE will always be edge, and for IO-APIC, the
 	 * actual level or edge trigger will be setup in the IO-APIC
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 09/35] x86/apic: Always provide irq_compose_msi_msg() method for vector domain
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (7 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 08/35] x86/apic: Cleanup destination mode David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
  2020-10-24 21:35                     ` [PATCH v3 10/35] x86/hpet: Move MSI support into hpet.c David Woodhouse
                                       ` (26 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: David Woodhouse <dwmw@amazon.co.uk>

This shouldn't be dependent on PCI_MSI. HPET and I/O-APIC can deliver
interrupts through MSI without having any PCI in the system at all.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/apic.h   |  8 +++-----
 arch/x86/kernel/apic/apic.c   | 32 ++++++++++++++++++++++++++++++
 arch/x86/kernel/apic/msi.c    | 37 -----------------------------------
 arch/x86/kernel/apic/vector.c |  6 ++++++
 4 files changed, 41 insertions(+), 42 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 652f62252349..b489bb9894ae 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -520,12 +520,10 @@ static inline void apic_smt_update(void) { }
 #endif
 
 struct msi_msg;
+struct irq_cfg;
 
-#ifdef CONFIG_PCI_MSI
-void x86_vector_msi_compose_msg(struct irq_data *data, struct msi_msg *msg);
-#else
-# define x86_vector_msi_compose_msg NULL
-#endif
+extern void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
+				  bool dmar);
 
 extern void ioapic_zap_locks(void);
 
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 54f04355aaa2..4c15bf29ea2c 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -50,6 +50,7 @@
 #include <asm/io_apic.h>
 #include <asm/desc.h>
 #include <asm/hpet.h>
+#include <asm/msidef.h>
 #include <asm/mtrr.h>
 #include <asm/time.h>
 #include <asm/smp.h>
@@ -2480,6 +2481,37 @@ int hard_smp_processor_id(void)
 	return read_apic_id();
 }
 
+void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
+			   bool dmar)
+{
+	msg->address_hi = MSI_ADDR_BASE_HI;
+
+	msg->address_lo =
+		MSI_ADDR_BASE_LO |
+		(apic->dest_mode_logical ?
+			MSI_ADDR_DEST_MODE_LOGICAL :
+			MSI_ADDR_DEST_MODE_PHYSICAL) |
+		MSI_ADDR_REDIRECTION_CPU |
+		MSI_ADDR_DEST_ID(cfg->dest_apicid);
+
+	msg->data =
+		MSI_DATA_TRIGGER_EDGE |
+		MSI_DATA_LEVEL_ASSERT |
+		MSI_DATA_DELIVERY_FIXED |
+		MSI_DATA_VECTOR(cfg->vector);
+
+	/*
+	 * Only the IOMMU itself can use the trick of putting destination
+	 * APIC ID into the high bits of the address. Anything else would
+	 * just be writing to memory if it tried that, and needs IR to
+	 * address higher APIC IDs.
+	 */
+	if (dmar)
+		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
+	else
+		WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid));
+}
+
 /*
  * Override the generic EOI implementation with an optimized version.
  * Only called during early boot when only one CPU is active and with
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 46ffd41a4238..4eda617eda1e 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -15,7 +15,6 @@
 #include <linux/hpet.h>
 #include <linux/msi.h>
 #include <asm/irqdomain.h>
-#include <asm/msidef.h>
 #include <asm/hpet.h>
 #include <asm/hw_irq.h>
 #include <asm/apic.h>
@@ -23,42 +22,6 @@
 
 struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
 
-static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
-				  bool dmar)
-{
-	msg->address_hi = MSI_ADDR_BASE_HI;
-
-	msg->address_lo =
-		MSI_ADDR_BASE_LO |
-		(apic->dest_mode_logical ?
-			MSI_ADDR_DEST_MODE_LOGICAL :
-			MSI_ADDR_DEST_MODE_PHYSICAL) |
-		MSI_ADDR_REDIRECTION_CPU |
-		MSI_ADDR_DEST_ID(cfg->dest_apicid);
-
-	msg->data =
-		MSI_DATA_TRIGGER_EDGE |
-		MSI_DATA_LEVEL_ASSERT |
-		MSI_DATA_DELIVERY_FIXED |
-		MSI_DATA_VECTOR(cfg->vector);
-
-	/*
-	 * Only the IOMMU itself can use the trick of putting destination
-	 * APIC ID into the high bits of the address. Anything else would
-	 * just be writing to memory if it tried that, and needs IR to
-	 * address higher APIC IDs.
-	 */
-	if (dmar)
-		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
-	else
-		WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid));
-}
-
-void x86_vector_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
-{
-	__irq_msi_compose_msg(irqd_cfg(data), msg, false);
-}
-
 static void irq_msi_update_msg(struct irq_data *irqd, struct irq_cfg *cfg)
 {
 	struct msi_msg msg[2] = { [1] = { }, };
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 1eac53632786..bb2e2a2488a5 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -818,6 +818,12 @@ void apic_ack_edge(struct irq_data *irqd)
 	apic_ack_irq(irqd);
 }
 
+static void x86_vector_msi_compose_msg(struct irq_data *data,
+				       struct msi_msg *msg)
+{
+       __irq_msi_compose_msg(irqd_cfg(data), msg, false);
+}
+
 static struct irq_chip lapic_controller = {
 	.name			= "APIC",
 	.irq_ack		= apic_ack_edge,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 10/35] x86/hpet: Move MSI support into hpet.c
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (8 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 09/35] x86/apic: Always provide irq_compose_msi_msg() method for vector domain David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
  2020-10-24 21:35                     ` [PATCH v3 11/35] genirq/msi: Allow shadow declarations of msi_msg::$member David Woodhouse
                                       ` (25 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: David Woodhouse <dwmw@amazon.co.uk>

This isn't really dependent on PCI MSI; it's just generic MSI which is now
supported by the generic x86_vector_domain. Move the HPET MSI support back
into hpet.c with the rest of the HPET support.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/hpet.h |  11 ----
 arch/x86/kernel/apic/msi.c  | 111 ---------------------------------
 arch/x86/kernel/hpet.c      | 118 ++++++++++++++++++++++++++++++++++--
 3 files changed, 112 insertions(+), 128 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 6352dee37cda..ab9f3dd87c80 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -74,17 +74,6 @@ extern void hpet_disable(void);
 extern unsigned int hpet_readl(unsigned int a);
 extern void force_hpet_resume(void);
 
-struct irq_data;
-struct hpet_channel;
-struct irq_domain;
-
-extern void hpet_msi_unmask(struct irq_data *data);
-extern void hpet_msi_mask(struct irq_data *data);
-extern void hpet_msi_write(struct hpet_channel *hc, struct msi_msg *msg);
-extern struct irq_domain *hpet_create_irq_domain(int hpet_id);
-extern int hpet_assign_irq(struct irq_domain *domain,
-			   struct hpet_channel *hc, int dev_num);
-
 #ifdef CONFIG_HPET_EMULATE_RTC
 
 #include <linux/interrupt.h>
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 4eda617eda1e..44ebe25e7703 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -340,114 +340,3 @@ void dmar_free_hwirq(int irq)
 	irq_domain_free_irqs(irq, 1);
 }
 #endif
-
-/*
- * MSI message composition
- */
-#ifdef CONFIG_HPET_TIMER
-static inline int hpet_dev_id(struct irq_domain *domain)
-{
-	struct msi_domain_info *info = msi_get_domain_info(domain);
-
-	return (int)(long)info->data;
-}
-
-static void hpet_msi_write_msg(struct irq_data *data, struct msi_msg *msg)
-{
-	hpet_msi_write(irq_data_get_irq_handler_data(data), msg);
-}
-
-static struct irq_chip hpet_msi_controller __ro_after_init = {
-	.name = "HPET-MSI",
-	.irq_unmask = hpet_msi_unmask,
-	.irq_mask = hpet_msi_mask,
-	.irq_ack = irq_chip_ack_parent,
-	.irq_set_affinity = msi_domain_set_affinity,
-	.irq_retrigger = irq_chip_retrigger_hierarchy,
-	.irq_write_msi_msg = hpet_msi_write_msg,
-	.flags = IRQCHIP_SKIP_SET_WAKE,
-};
-
-static int hpet_msi_init(struct irq_domain *domain,
-			 struct msi_domain_info *info, unsigned int virq,
-			 irq_hw_number_t hwirq, msi_alloc_info_t *arg)
-{
-	irq_set_status_flags(virq, IRQ_MOVE_PCNTXT);
-	irq_domain_set_info(domain, virq, arg->hwirq, info->chip, NULL,
-			    handle_edge_irq, arg->data, "edge");
-
-	return 0;
-}
-
-static void hpet_msi_free(struct irq_domain *domain,
-			  struct msi_domain_info *info, unsigned int virq)
-{
-	irq_clear_status_flags(virq, IRQ_MOVE_PCNTXT);
-}
-
-static struct msi_domain_ops hpet_msi_domain_ops = {
-	.msi_init	= hpet_msi_init,
-	.msi_free	= hpet_msi_free,
-};
-
-static struct msi_domain_info hpet_msi_domain_info = {
-	.ops		= &hpet_msi_domain_ops,
-	.chip		= &hpet_msi_controller,
-	.flags		= MSI_FLAG_USE_DEF_DOM_OPS,
-};
-
-struct irq_domain *hpet_create_irq_domain(int hpet_id)
-{
-	struct msi_domain_info *domain_info;
-	struct irq_domain *parent, *d;
-	struct irq_alloc_info info;
-	struct fwnode_handle *fn;
-
-	if (x86_vector_domain == NULL)
-		return NULL;
-
-	domain_info = kzalloc(sizeof(*domain_info), GFP_KERNEL);
-	if (!domain_info)
-		return NULL;
-
-	*domain_info = hpet_msi_domain_info;
-	domain_info->data = (void *)(long)hpet_id;
-
-	init_irq_alloc_info(&info, NULL);
-	info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
-	info.devid = hpet_id;
-	parent = irq_remapping_get_irq_domain(&info);
-	if (parent == NULL)
-		parent = x86_vector_domain;
-	else
-		hpet_msi_controller.name = "IR-HPET-MSI";
-
-	fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
-					      hpet_id);
-	if (!fn) {
-		kfree(domain_info);
-		return NULL;
-	}
-
-	d = msi_create_irq_domain(fn, domain_info, parent);
-	if (!d) {
-		irq_domain_free_fwnode(fn);
-		kfree(domain_info);
-	}
-	return d;
-}
-
-int hpet_assign_irq(struct irq_domain *domain, struct hpet_channel *hc,
-		    int dev_num)
-{
-	struct irq_alloc_info info;
-
-	init_irq_alloc_info(&info, NULL);
-	info.type = X86_IRQ_ALLOC_TYPE_HPET;
-	info.data = hc;
-	info.devid = hpet_dev_id(domain);
-	info.hwirq = dev_num;
-
-	return irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, &info);
-}
-#endif
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 7a50f0b62a70..3b8b12769f3b 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -7,6 +7,7 @@
 #include <linux/cpu.h>
 #include <linux/irq.h>
 
+#include <asm/irq_remapping.h>
 #include <asm/hpet.h>
 #include <asm/time.h>
 
@@ -50,7 +51,7 @@ unsigned long				hpet_address;
 u8					hpet_blockid; /* OS timer block num */
 bool					hpet_msi_disable;
 
-#ifdef CONFIG_PCI_MSI
+#ifdef CONFIG_GENERIC_MSI_IRQ
 static DEFINE_PER_CPU(struct hpet_channel *, cpu_hpet_channel);
 static struct irq_domain		*hpet_domain;
 #endif
@@ -467,9 +468,8 @@ static void __init hpet_legacy_clockevent_register(struct hpet_channel *hc)
 /*
  * HPET MSI Support
  */
-#ifdef CONFIG_PCI_MSI
-
-void hpet_msi_unmask(struct irq_data *data)
+#ifdef CONFIG_GENERIC_MSI_IRQ
+static void hpet_msi_unmask(struct irq_data *data)
 {
 	struct hpet_channel *hc = irq_data_get_irq_handler_data(data);
 	unsigned int cfg;
@@ -479,7 +479,7 @@ void hpet_msi_unmask(struct irq_data *data)
 	hpet_writel(cfg, HPET_Tn_CFG(hc->num));
 }
 
-void hpet_msi_mask(struct irq_data *data)
+static void hpet_msi_mask(struct irq_data *data)
 {
 	struct hpet_channel *hc = irq_data_get_irq_handler_data(data);
 	unsigned int cfg;
@@ -489,12 +489,118 @@ void hpet_msi_mask(struct irq_data *data)
 	hpet_writel(cfg, HPET_Tn_CFG(hc->num));
 }
 
-void hpet_msi_write(struct hpet_channel *hc, struct msi_msg *msg)
+static void hpet_msi_write(struct hpet_channel *hc, struct msi_msg *msg)
 {
 	hpet_writel(msg->data, HPET_Tn_ROUTE(hc->num));
 	hpet_writel(msg->address_lo, HPET_Tn_ROUTE(hc->num) + 4);
 }
 
+static void hpet_msi_write_msg(struct irq_data *data, struct msi_msg *msg)
+{
+	hpet_msi_write(irq_data_get_irq_handler_data(data), msg);
+}
+
+static struct irq_chip hpet_msi_controller __ro_after_init = {
+	.name = "HPET-MSI",
+	.irq_unmask = hpet_msi_unmask,
+	.irq_mask = hpet_msi_mask,
+	.irq_ack = irq_chip_ack_parent,
+	.irq_set_affinity = msi_domain_set_affinity,
+	.irq_retrigger = irq_chip_retrigger_hierarchy,
+	.irq_write_msi_msg = hpet_msi_write_msg,
+	.flags = IRQCHIP_SKIP_SET_WAKE,
+};
+
+static int hpet_msi_init(struct irq_domain *domain,
+			 struct msi_domain_info *info, unsigned int virq,
+			 irq_hw_number_t hwirq, msi_alloc_info_t *arg)
+{
+	irq_set_status_flags(virq, IRQ_MOVE_PCNTXT);
+	irq_domain_set_info(domain, virq, arg->hwirq, info->chip, NULL,
+			    handle_edge_irq, arg->data, "edge");
+
+	return 0;
+}
+
+static void hpet_msi_free(struct irq_domain *domain,
+			  struct msi_domain_info *info, unsigned int virq)
+{
+	irq_clear_status_flags(virq, IRQ_MOVE_PCNTXT);
+}
+
+static struct msi_domain_ops hpet_msi_domain_ops = {
+	.msi_init	= hpet_msi_init,
+	.msi_free	= hpet_msi_free,
+};
+
+static struct msi_domain_info hpet_msi_domain_info = {
+	.ops		= &hpet_msi_domain_ops,
+	.chip		= &hpet_msi_controller,
+	.flags		= MSI_FLAG_USE_DEF_DOM_OPS,
+};
+
+static struct irq_domain *hpet_create_irq_domain(int hpet_id)
+{
+	struct msi_domain_info *domain_info;
+	struct irq_domain *parent, *d;
+	struct irq_alloc_info info;
+	struct fwnode_handle *fn;
+
+	if (x86_vector_domain == NULL)
+		return NULL;
+
+	domain_info = kzalloc(sizeof(*domain_info), GFP_KERNEL);
+	if (!domain_info)
+		return NULL;
+
+	*domain_info = hpet_msi_domain_info;
+	domain_info->data = (void *)(long)hpet_id;
+
+	init_irq_alloc_info(&info, NULL);
+	info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
+	info.devid = hpet_id;
+	parent = irq_remapping_get_irq_domain(&info);
+	if (parent == NULL)
+		parent = x86_vector_domain;
+	else
+		hpet_msi_controller.name = "IR-HPET-MSI";
+
+	fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
+					      hpet_id);
+	if (!fn) {
+		kfree(domain_info);
+		return NULL;
+	}
+
+	d = msi_create_irq_domain(fn, domain_info, parent);
+	if (!d) {
+		irq_domain_free_fwnode(fn);
+		kfree(domain_info);
+	}
+	return d;
+}
+
+static inline int hpet_dev_id(struct irq_domain *domain)
+{
+	struct msi_domain_info *info = msi_get_domain_info(domain);
+
+	return (int)(long)info->data;
+}
+
+static int hpet_assign_irq(struct irq_domain *domain, struct hpet_channel *hc,
+			   int dev_num)
+{
+	struct irq_alloc_info info;
+
+	init_irq_alloc_info(&info, NULL);
+	info.type = X86_IRQ_ALLOC_TYPE_HPET;
+	info.data = hc;
+	info.devid = hpet_dev_id(domain);
+	info.hwirq = dev_num;
+
+	return irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, &info);
+}
+
 static int hpet_clkevt_msi_resume(struct clock_event_device *evt)
 {
 	struct hpet_channel *hc = clockevent_to_channel(evt);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 11/35] genirq/msi: Allow shadow declarations of msi_msg::$member
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (9 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 10/35] x86/hpet: Move MSI support into hpet.c David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] genirq/msi: Allow shadow declarations of msi_msg:: $member tip-bot2 for Thomas Gleixner
  2020-10-24 21:35                     ` [PATCH v3 12/35] x86/msi: Provide msi message shadow structs David Woodhouse
                                       ` (24 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: Thomas Gleixner <tglx@linutronix.de>

Architectures like x86 have their MSI messages in various bits of the data,
address_lo and address_hi field. Composing or decomposing these messages
with bitmasks and shifts is possible, but unreadable gunk.

Allow architectures to provide an architecture specific representation for
each member of msi_msg. Provide empty defaults for each and stick them into
an union.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 include/asm-generic/msi.h |  4 ++++
 include/linux/msi.h       | 46 +++++++++++++++++++++++++++++++++++----
 2 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/include/asm-generic/msi.h b/include/asm-generic/msi.h
index e6795f088bdd..25344de0e8f9 100644
--- a/include/asm-generic/msi.h
+++ b/include/asm-generic/msi.h
@@ -4,6 +4,8 @@
 
 #include <linux/types.h>
 
+#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
+
 #ifndef NUM_MSI_ALLOC_SCRATCHPAD_REGS
 # define NUM_MSI_ALLOC_SCRATCHPAD_REGS	2
 #endif
@@ -30,4 +32,6 @@ typedef struct msi_alloc_info {
 
 #define GENERIC_MSI_DOMAIN_OPS		1
 
+#endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */
+
 #endif
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 6b584cc4757c..360a0a7e7341 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -4,11 +4,50 @@
 
 #include <linux/kobject.h>
 #include <linux/list.h>
+#include <asm/msi.h>
+
+/* Dummy shadow structures if an architecture does not define them */
+#ifndef arch_msi_msg_addr_lo
+typedef struct arch_msi_msg_addr_lo {
+	u32	address_lo;
+} __attribute__ ((packed)) arch_msi_msg_addr_lo_t;
+#endif
+
+#ifndef arch_msi_msg_addr_hi
+typedef struct arch_msi_msg_addr_hi {
+	u32	address_hi;
+} __attribute__ ((packed)) arch_msi_msg_addr_hi_t;
+#endif
+
+#ifndef arch_msi_msg_data
+typedef struct arch_msi_msg_data {
+	u32	data;
+} __attribute__ ((packed)) arch_msi_msg_data_t;
+#endif
 
+/**
+ * msi_msg - Representation of a MSI message
+ * @address_lo:		Low 32 bits of msi message address
+ * @arch_addrlo:	Architecture specific shadow of @address_lo
+ * @address_hi:		High 32 bits of msi message address
+ *			(only used when device supports it)
+ * @arch_addrhi:	Architecture specific shadow of @address_hi
+ * @data:		MSI message data (usually 16 bits)
+ * @arch_data:		Architecture specific shadow of @data
+ */
 struct msi_msg {
-	u32	address_lo;	/* low 32 bits of msi message address */
-	u32	address_hi;	/* high 32 bits of msi message address */
-	u32	data;		/* 16 bits of msi message data */
+	union {
+		u32			address_lo;
+		arch_msi_msg_addr_lo_t	arch_addr_lo;
+	};
+	union {
+		u32			address_hi;
+		arch_msi_msg_addr_hi_t	arch_addr_hi;
+	};
+	union {
+		u32			data;
+		arch_msi_msg_data_t	arch_data;
+	};
 };
 
 extern int pci_msi_ignore_mask;
@@ -243,7 +282,6 @@ struct msi_controller {
 #ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
 
 #include <linux/irqhandler.h>
-#include <asm/msi.h>
 
 struct irq_domain;
 struct irq_domain_ops;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 12/35] x86/msi: Provide msi message shadow structs
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (10 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 11/35] genirq/msi: Allow shadow declarations of msi_msg::$member David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
  2022-04-06  8:36                       ` [PATCH v3 12/35] " Reto Buerki
  2020-10-24 21:35                     ` [PATCH v3 13/35] iommu/intel: Use msi_msg " David Woodhouse
                                       ` (23 subsequent siblings)
  35 siblings, 2 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: Thomas Gleixner <tglx@linutronix.de>

Create shadow structs with named bitfields for msi_msg data, address_lo and
address_hi and use them in the MSI message composer.

Provide a function to retrieve the destination ID. This could be inline,
but that'd create a circular header dependency.

[dwmw2: fix bitfields not all to be a union]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/msi.h  | 49 +++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/apic/apic.c | 35 ++++++++++++++------------
 2 files changed, 68 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/msi.h b/arch/x86/include/asm/msi.h
index cd30013d15d3..322fd905da9c 100644
--- a/arch/x86/include/asm/msi.h
+++ b/arch/x86/include/asm/msi.h
@@ -9,4 +9,53 @@ typedef struct irq_alloc_info msi_alloc_info_t;
 int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
 		    msi_alloc_info_t *arg);
 
+/* Structs and defines for the X86 specific MSI message format */
+
+typedef struct x86_msi_data {
+	u32	vector			:  8,
+		delivery_mode		:  3,
+		dest_mode_logical	:  1,
+		reserved		:  2,
+		active_low		:  1,
+		is_level		:  1;
+
+	u32	dmar_subhandle;
+} __attribute__ ((packed)) arch_msi_msg_data_t;
+#define arch_msi_msg_data	x86_msi_data
+
+typedef struct x86_msi_addr_lo {
+	union {
+		struct {
+			u32	reserved_0		:  2,
+				dest_mode_logical	:  1,
+				redirect_hint		:  1,
+				reserved_1		:  8,
+				destid_0_7		:  8,
+				base_address		: 12;
+		};
+		struct {
+			u32	dmar_reserved_0		:  2,
+				dmar_index_15		:  1,
+				dmar_subhandle_valid	:  1,
+				dmar_format		:  1,
+				dmar_index_0_14		: 15,
+				dmar_base_address	: 12;
+		};
+	};
+} __attribute__ ((packed)) arch_msi_msg_addr_lo_t;
+#define arch_msi_msg_addr_lo	x86_msi_addr_lo
+
+#define X86_MSI_BASE_ADDRESS_LOW	(0xfee00000 >> 20)
+
+typedef struct x86_msi_addr_hi {
+	u32	reserved		:  8,
+		destid_8_31		: 24;
+} __attribute__ ((packed)) arch_msi_msg_addr_hi_t;
+#define arch_msi_msg_addr_hi	x86_msi_addr_hi
+
+#define X86_MSI_BASE_ADDRESS_HIGH	(0)
+
+struct msi_msg;
+u32 x86_msi_msg_get_destid(struct msi_msg *msg, bool extid);
+
 #endif /* _ASM_X86_MSI_H */
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 4c15bf29ea2c..f7196ee0f005 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -50,7 +50,6 @@
 #include <asm/io_apic.h>
 #include <asm/desc.h>
 #include <asm/hpet.h>
-#include <asm/msidef.h>
 #include <asm/mtrr.h>
 #include <asm/time.h>
 #include <asm/smp.h>
@@ -2484,22 +2483,16 @@ int hard_smp_processor_id(void)
 void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
 			   bool dmar)
 {
-	msg->address_hi = MSI_ADDR_BASE_HI;
+	memset(msg, 0, sizeof(*msg));
 
-	msg->address_lo =
-		MSI_ADDR_BASE_LO |
-		(apic->dest_mode_logical ?
-			MSI_ADDR_DEST_MODE_LOGICAL :
-			MSI_ADDR_DEST_MODE_PHYSICAL) |
-		MSI_ADDR_REDIRECTION_CPU |
-		MSI_ADDR_DEST_ID(cfg->dest_apicid);
+	msg->arch_addr_lo.base_address = X86_MSI_BASE_ADDRESS_LOW;
+	msg->arch_addr_lo.dest_mode_logical = apic->dest_mode_logical;
+	msg->arch_addr_lo.destid_0_7 = cfg->dest_apicid & 0xFF;
 
-	msg->data =
-		MSI_DATA_TRIGGER_EDGE |
-		MSI_DATA_LEVEL_ASSERT |
-		MSI_DATA_DELIVERY_FIXED |
-		MSI_DATA_VECTOR(cfg->vector);
+	msg->arch_data.delivery_mode = APIC_DELIVERY_MODE_FIXED;
+	msg->arch_data.vector = cfg->vector;
 
+	msg->address_hi = X86_MSI_BASE_ADDRESS_HIGH;
 	/*
 	 * Only the IOMMU itself can use the trick of putting destination
 	 * APIC ID into the high bits of the address. Anything else would
@@ -2507,11 +2500,21 @@ void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
 	 * address higher APIC IDs.
 	 */
 	if (dmar)
-		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
+		msg->arch_addr_hi.destid_8_31 = cfg->dest_apicid >> 8;
 	else
-		WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid));
+		WARN_ON_ONCE(cfg->dest_apicid > 0xFF);
 }
 
+u32 x86_msi_msg_get_destid(struct msi_msg *msg, bool extid)
+{
+	u32 dest = msg->arch_addr_lo.destid_0_7;
+
+	if (extid)
+		dest |= msg->arch_addr_hi.destid_8_31 << 8;
+	return dest;
+}
+EXPORT_SYMBOL_GPL(x86_msi_msg_get_destid);
+
 /*
  * Override the generic EOI implementation with an optimized version.
  * Only called during early boot when only one CPU is active and with
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 13/35] iommu/intel: Use msi_msg shadow structs
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (11 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 12/35] x86/msi: Provide msi message shadow structs David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
  2020-10-24 21:35                     ` [PATCH v3 14/35] iommu/amd: " David Woodhouse
                                       ` (22 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: Thomas Gleixner <tglx@linutronix.de>

Use the bitfields in the x86 shadow struct to compose the MSI message.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 drivers/iommu/intel/irq_remapping.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 5628d43b795e..30269b738fa5 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -20,7 +20,6 @@
 #include <asm/cpu.h>
 #include <asm/irq_remapping.h>
 #include <asm/pci-direct.h>
-#include <asm/msidef.h>
 
 #include "../irq_remapping.h"
 
@@ -1260,6 +1259,21 @@ static struct irq_chip intel_ir_chip = {
 	.irq_set_vcpu_affinity	= intel_ir_set_vcpu_affinity,
 };
 
+static void fill_msi_msg(struct msi_msg *msg, u32 index, u32 subhandle)
+{
+	memset(msg, 0, sizeof(*msg));
+
+	msg->arch_addr_lo.dmar_base_address = X86_MSI_BASE_ADDRESS_LOW;
+	msg->arch_addr_lo.dmar_subhandle_valid = true;
+	msg->arch_addr_lo.dmar_format = true;
+	msg->arch_addr_lo.dmar_index_0_14 = index & 0x7FFF;
+	msg->arch_addr_lo.dmar_index_15 = !!(index & 0x8000);
+
+	msg->address_hi = X86_MSI_BASE_ADDRESS_HIGH;
+
+	msg->arch_data.dmar_subhandle = subhandle;
+}
+
 static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 					     struct irq_cfg *irq_cfg,
 					     struct irq_alloc_info *info,
@@ -1267,7 +1281,6 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 {
 	struct IR_IO_APIC_route_entry *entry;
 	struct irte *irte = &data->irte_entry;
-	struct msi_msg *msg = &data->msi_entry;
 
 	prepare_irte(irte, irq_cfg->vector, irq_cfg->dest_apicid);
 	switch (info->type) {
@@ -1308,12 +1321,7 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 		else
 			set_msi_sid(irte, msi_desc_to_pci_dev(info->desc));
 
-		msg->address_hi = MSI_ADDR_BASE_HI;
-		msg->data = sub_handle;
-		msg->address_lo = MSI_ADDR_BASE_LO | MSI_ADDR_IR_EXT_INT |
-				  MSI_ADDR_IR_SHV |
-				  MSI_ADDR_IR_INDEX1(index) |
-				  MSI_ADDR_IR_INDEX2(index);
+		fill_msi_msg(&data->msi_entry, index, sub_handle);
 		break;
 
 	default:
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 14/35] iommu/amd: Use msi_msg shadow structs
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (12 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 13/35] iommu/intel: Use msi_msg " David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
  2020-10-24 21:35                     ` [PATCH v3 15/35] PCI: vmd: " David Woodhouse
                                       ` (21 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: Thomas Gleixner <tglx@linutronix.de>

Get rid of the macro mess and use the shadow structs for the x86 specific
MSI message format. Convert the intcapxt setup to use named bitfields as
well while touching it anyway.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 drivers/iommu/amd/init.c  | 46 ++++++++++++++++++++++-----------------
 drivers/iommu/amd/iommu.c | 14 +++++++-----
 2 files changed, 35 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 82e4af8f09bb..263670d36fed 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -23,7 +23,6 @@
 #include <asm/pci-direct.h>
 #include <asm/iommu.h>
 #include <asm/apic.h>
-#include <asm/msidef.h>
 #include <asm/gart.h>
 #include <asm/x86_init.h>
 #include <asm/iommu_table.h>
@@ -1966,10 +1965,16 @@ static int iommu_setup_msi(struct amd_iommu *iommu)
 	return 0;
 }
 
-#define XT_INT_DEST_MODE(x)	(((x) & 0x1ULL) << 2)
-#define XT_INT_DEST_LO(x)	(((x) & 0xFFFFFFULL) << 8)
-#define XT_INT_VEC(x)		(((x) & 0xFFULL) << 32)
-#define XT_INT_DEST_HI(x)	((((x) >> 24) & 0xFFULL) << 56)
+union intcapxt {
+	u64	capxt;
+	u64	reserved_0		:  2,
+		dest_mode_logical	:  1,
+		reserved_1		:  5,
+		destid_0_23		: 24,
+		vector			:  8,
+		reserved_2		: 16,
+		destid_24_31		:  8;
+} __attribute__ ((packed));
 
 /*
  * Setup the IntCapXT registers with interrupt routing information
@@ -1978,28 +1983,29 @@ static int iommu_setup_msi(struct amd_iommu *iommu)
  */
 static void iommu_update_intcapxt(struct amd_iommu *iommu)
 {
-	u64 val;
-	u32 addr_lo = readl(iommu->mmio_base + MMIO_MSI_ADDR_LO_OFFSET);
-	u32 addr_hi = readl(iommu->mmio_base + MMIO_MSI_ADDR_HI_OFFSET);
-	u32 data    = readl(iommu->mmio_base + MMIO_MSI_DATA_OFFSET);
-	bool dm     = (addr_lo >> MSI_ADDR_DEST_MODE_SHIFT) & 0x1;
-	u32 dest    = ((addr_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xFF);
+	struct msi_msg msg;
+	union intcapxt xt;
+	u32 destid;
 
-	if (x2apic_enabled())
-		dest |= MSI_ADDR_EXT_DEST_ID(addr_hi);
+	msg.address_lo = readl(iommu->mmio_base + MMIO_MSI_ADDR_LO_OFFSET);
+	msg.address_hi = readl(iommu->mmio_base + MMIO_MSI_ADDR_HI_OFFSET);
+	msg.data = readl(iommu->mmio_base + MMIO_MSI_DATA_OFFSET);
 
-	val = XT_INT_VEC(data & 0xFF) |
-	      XT_INT_DEST_MODE(dm) |
-	      XT_INT_DEST_LO(dest) |
-	      XT_INT_DEST_HI(dest);
+	destid = x86_msi_msg_get_destid(&msg, x2apic_enabled());
+
+	xt.capxt = 0ULL;
+	xt.dest_mode_logical = msg.arch_data.dest_mode_logical;
+	xt.vector = msg.arch_data.vector;
+	xt.destid_0_23 = destid & GENMASK(23, 0);
+	xt.destid_24_31 = destid >> 24;
 
 	/**
 	 * Current IOMMU implemtation uses the same IRQ for all
 	 * 3 IOMMU interrupts.
 	 */
-	writeq(val, iommu->mmio_base + MMIO_INTCAPXT_EVT_OFFSET);
-	writeq(val, iommu->mmio_base + MMIO_INTCAPXT_PPR_OFFSET);
-	writeq(val, iommu->mmio_base + MMIO_INTCAPXT_GALOG_OFFSET);
+	writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_EVT_OFFSET);
+	writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_PPR_OFFSET);
+	writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_GALOG_OFFSET);
 }
 
 static void _irq_notifier_notify(struct irq_affinity_notify *notify,
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index d7f0c8908602..473de0920b64 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -35,7 +35,6 @@
 #include <asm/io_apic.h>
 #include <asm/apic.h>
 #include <asm/hw_irq.h>
-#include <asm/msidef.h>
 #include <asm/proto.h>
 #include <asm/iommu.h>
 #include <asm/gart.h>
@@ -3656,13 +3655,20 @@ struct irq_remap_ops amd_iommu_irq_ops = {
 	.get_irq_domain		= get_irq_domain,
 };
 
+static void fill_msi_msg(struct msi_msg *msg, u32 index)
+{
+	msg->data = index;
+	msg->address_lo = 0;
+	msg->arch_addr_lo.base_address = X86_MSI_BASE_ADDRESS_LOW;
+	msg->address_hi = X86_MSI_BASE_ADDRESS_HIGH;
+}
+
 static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 				       struct irq_cfg *irq_cfg,
 				       struct irq_alloc_info *info,
 				       int devid, int index, int sub_handle)
 {
 	struct irq_2_irte *irte_info = &data->irq_2_irte;
-	struct msi_msg *msg = &data->msi_entry;
 	struct IO_APIC_route_entry *entry;
 	struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
 
@@ -3693,9 +3699,7 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 	case X86_IRQ_ALLOC_TYPE_HPET:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-		msg->address_hi = MSI_ADDR_BASE_HI;
-		msg->address_lo = MSI_ADDR_BASE_LO;
-		msg->data = irte_info->index;
+		fill_msi_msg(&data->msi_entry, irte_info->index);
 		break;
 
 	default:
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 15/35] PCI: vmd: Use msi_msg shadow structs
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (13 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 14/35] iommu/amd: " David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-28 20:49                       ` Kees Cook
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
  2020-10-24 21:35                     ` [PATCH v3 16/35] x86/kvm: " David Woodhouse
                                       ` (20 subsequent siblings)
  35 siblings, 2 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: Thomas Gleixner <tglx@linutronix.de>

Use the x86 shadow structs in msi_msg instead of the macros.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 drivers/pci/controller/vmd.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index aa1b12bac9a1..72de3c6f644e 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -18,7 +18,6 @@
 #include <asm/irqdomain.h>
 #include <asm/device.h>
 #include <asm/msi.h>
-#include <asm/msidef.h>
 
 #define VMD_CFGBAR	0
 #define VMD_MEMBAR1	2
@@ -131,10 +130,10 @@ static void vmd_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
 	struct vmd_irq_list *irq = vmdirq->irq;
 	struct vmd_dev *vmd = irq_data_get_irq_handler_data(data);
 
-	msg->address_hi = MSI_ADDR_BASE_HI;
-	msg->address_lo = MSI_ADDR_BASE_LO |
-			  MSI_ADDR_DEST_ID(index_from_irqs(vmd, irq));
-	msg->data = 0;
+	memset(&msg, 0, sizeof(*msg);
+	msg->address_hi = X86_MSI_BASE_ADDRESS_HIGH;
+	msg->arch_addr_lo.base_address = X86_MSI_BASE_ADDRESS_LOW;
+	msg->arch_addr_lo.destid_0_7 = index_from_irqs(vmd, irq);
 }
 
 /*
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 16/35] x86/kvm: Use msi_msg shadow structs
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (14 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 15/35] PCI: vmd: " David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
  2020-10-24 21:35                     ` [PATCH v3 17/35] x86/pci/xen: " David Woodhouse
                                       ` (19 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: Thomas Gleixner <tglx@linutronix.de>

Use the bitfields in the x86 shadow structs instead of decomposing the
32bit value with macros.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kvm/irq_comm.c | 31 +++++++++++++------------------
 1 file changed, 13 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 4aa1c2e00e2a..8a4de3f12820 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -16,8 +16,6 @@
 
 #include <trace/events/kvm.h>
 
-#include <asm/msidef.h>
-
 #include "irq.h"
 
 #include "ioapic.h"
@@ -104,22 +102,19 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
 void kvm_set_msi_irq(struct kvm *kvm, struct kvm_kernel_irq_routing_entry *e,
 		     struct kvm_lapic_irq *irq)
 {
-	trace_kvm_msi_set_irq(e->msi.address_lo | (kvm->arch.x2apic_format ?
-	                                     (u64)e->msi.address_hi << 32 : 0),
-	                      e->msi.data);
-
-	irq->dest_id = (e->msi.address_lo &
-			MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
-	if (kvm->arch.x2apic_format)
-		irq->dest_id |= MSI_ADDR_EXT_DEST_ID(e->msi.address_hi);
-	irq->vector = (e->msi.data &
-			MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
-	irq->dest_mode = kvm_lapic_irq_dest_mode(
-	    !!((1 << MSI_ADDR_DEST_MODE_SHIFT) & e->msi.address_lo));
-	irq->trig_mode = (1 << MSI_DATA_TRIGGER_SHIFT) & e->msi.data;
-	irq->delivery_mode = e->msi.data & 0x700;
-	irq->msi_redir_hint = ((e->msi.address_lo
-		& MSI_ADDR_REDIRECTION_LOWPRI) > 0);
+	struct msi_msg msg = { .address_lo = e->msi.address_lo,
+			       .address_hi = e->msi.address_hi,
+			       .data = e->msi.data };
+
+	trace_kvm_msi_set_irq(msg.address_lo | (kvm->arch.x2apic_format ?
+			      (u64)msg.address_hi << 32 : 0), msg.data);
+
+	irq->dest_id = x86_msi_msg_get_destid(&msg, kvm->arch.x2apic_format);
+	irq->vector = msg.arch_data.vector;
+	irq->dest_mode = kvm_lapic_irq_dest_mode(msg.arch_addr_lo.dest_mode_logical);
+	irq->trig_mode = msg.arch_data.is_level;
+	irq->delivery_mode = msg.arch_data.delivery_mode << 8;
+	irq->msi_redir_hint = msg.arch_addr_lo.redirect_hint;
 	irq->level = 1;
 	irq->shorthand = APIC_DEST_NOSHORT;
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 17/35] x86/pci/xen: Use msi_msg shadow structs
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (15 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 16/35] x86/kvm: " David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-25  9:49                       ` David Laight
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
  2020-10-24 21:35                     ` [PATCH v3 18/35] x86/msi: Remove msidef.h David Woodhouse
                                       ` (18 subsequent siblings)
  35 siblings, 2 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: Thomas Gleixner <tglx@linutronix.de>

Use the msi_msg shadow structs and compose the message with named bitfields
instead of the unreadable macro maze.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/pci/xen.c | 26 +++++++++++---------------
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
index c552cd2d0632..3d41a09c2c14 100644
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -152,7 +152,6 @@ static int acpi_register_gsi_xen(struct device *dev, u32 gsi,
 
 #if defined(CONFIG_PCI_MSI)
 #include <linux/msi.h>
-#include <asm/msidef.h>
 
 struct xen_pci_frontend_ops *xen_pci_frontend;
 EXPORT_SYMBOL_GPL(xen_pci_frontend);
@@ -210,23 +209,20 @@ static int xen_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 	return ret;
 }
 
-#define XEN_PIRQ_MSI_DATA  (MSI_DATA_TRIGGER_EDGE | \
-		MSI_DATA_LEVEL_ASSERT | (3 << 8) | MSI_DATA_VECTOR(0))
-
 static void xen_msi_compose_msg(struct pci_dev *pdev, unsigned int pirq,
 		struct msi_msg *msg)
 {
-	/* We set vector == 0 to tell the hypervisor we don't care about it,
-	 * but we want a pirq setup instead.
-	 * We use the dest_id field to pass the pirq that we want. */
-	msg->address_hi = MSI_ADDR_BASE_HI | MSI_ADDR_EXT_DEST_ID(pirq);
-	msg->address_lo =
-		MSI_ADDR_BASE_LO |
-		MSI_ADDR_DEST_MODE_PHYSICAL |
-		MSI_ADDR_REDIRECTION_CPU |
-		MSI_ADDR_DEST_ID(pirq);
-
-	msg->data = XEN_PIRQ_MSI_DATA;
+	/*
+	 * We set vector == 0 to tell the hypervisor we don't care about
+	 * it, but we want a pirq setup instead.  We use the dest_id fields
+	 * to pass the pirq that we want.
+	 */
+	memset(msg, 0, sizeof(*msg));
+	msg->address_hi = X86_MSI_BASE_ADDRESS_HIGH;
+	msg->arch_addr_hi.destid_8_31 = pirq >> 8;
+	msg->arch_addr_lo.destid_0_7 = pirq & 0xFF;
+	msg->arch_addr_lo.base_address = X86_MSI_BASE_ADDRESS_LOW;
+	msg->arch_data.delivery_mode = APIC_DELIVERY_MODE_EXTINT;
 }
 
 static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 18/35] x86/msi: Remove msidef.h
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (16 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 17/35] x86/pci/xen: " David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
  2020-10-24 21:35                     ` [PATCH v3 19/35] x86/io_apic: Cleanup trigger/polarity helpers David Woodhouse
                                       ` (17 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: Thomas Gleixner <tglx@linutronix.de>

Nothing uses the macro maze anymore.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/msidef.h | 57 -----------------------------------
 1 file changed, 57 deletions(-)
 delete mode 100644 arch/x86/include/asm/msidef.h

diff --git a/arch/x86/include/asm/msidef.h b/arch/x86/include/asm/msidef.h
deleted file mode 100644
index ee2f8ccc32d0..000000000000
--- a/arch/x86/include/asm/msidef.h
+++ /dev/null
@@ -1,57 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_X86_MSIDEF_H
-#define _ASM_X86_MSIDEF_H
-
-/*
- * Constants for Intel APIC based MSI messages.
- */
-
-/*
- * Shifts for MSI data
- */
-
-#define MSI_DATA_VECTOR_SHIFT		0
-#define  MSI_DATA_VECTOR_MASK		0x000000ff
-#define	 MSI_DATA_VECTOR(v)		(((v) << MSI_DATA_VECTOR_SHIFT) & \
-					 MSI_DATA_VECTOR_MASK)
-
-#define MSI_DATA_DELIVERY_MODE_SHIFT	8
-#define  MSI_DATA_DELIVERY_FIXED	(0 << MSI_DATA_DELIVERY_MODE_SHIFT)
-#define  MSI_DATA_DELIVERY_LOWPRI	(1 << MSI_DATA_DELIVERY_MODE_SHIFT)
-
-#define MSI_DATA_LEVEL_SHIFT		14
-#define	 MSI_DATA_LEVEL_DEASSERT	(0 << MSI_DATA_LEVEL_SHIFT)
-#define	 MSI_DATA_LEVEL_ASSERT		(1 << MSI_DATA_LEVEL_SHIFT)
-
-#define MSI_DATA_TRIGGER_SHIFT		15
-#define  MSI_DATA_TRIGGER_EDGE		(0 << MSI_DATA_TRIGGER_SHIFT)
-#define  MSI_DATA_TRIGGER_LEVEL		(1 << MSI_DATA_TRIGGER_SHIFT)
-
-/*
- * Shift/mask fields for msi address
- */
-
-#define MSI_ADDR_BASE_HI		0
-#define MSI_ADDR_BASE_LO		0xfee00000
-
-#define MSI_ADDR_DEST_MODE_SHIFT	2
-#define  MSI_ADDR_DEST_MODE_PHYSICAL	(0 << MSI_ADDR_DEST_MODE_SHIFT)
-#define	 MSI_ADDR_DEST_MODE_LOGICAL	(1 << MSI_ADDR_DEST_MODE_SHIFT)
-
-#define MSI_ADDR_REDIRECTION_SHIFT	3
-#define  MSI_ADDR_REDIRECTION_CPU	(0 << MSI_ADDR_REDIRECTION_SHIFT)
-					/* dedicated cpu */
-#define  MSI_ADDR_REDIRECTION_LOWPRI	(1 << MSI_ADDR_REDIRECTION_SHIFT)
-					/* lowest priority */
-
-#define MSI_ADDR_DEST_ID_SHIFT		12
-#define	 MSI_ADDR_DEST_ID_MASK		0x00ffff0
-#define  MSI_ADDR_DEST_ID(dest)		(((dest) << MSI_ADDR_DEST_ID_SHIFT) & \
-					 MSI_ADDR_DEST_ID_MASK)
-#define MSI_ADDR_EXT_DEST_ID(dest)	((dest) & 0xffffff00)
-
-#define MSI_ADDR_IR_EXT_INT		(1 << 4)
-#define MSI_ADDR_IR_SHV			(1 << 3)
-#define MSI_ADDR_IR_INDEX1(index)	((index & 0x8000) >> 13)
-#define MSI_ADDR_IR_INDEX2(index)	((index & 0x7fff) << 5)
-#endif /* _ASM_X86_MSIDEF_H */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 19/35] x86/io_apic: Cleanup trigger/polarity helpers
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (17 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 18/35] x86/msi: Remove msidef.h David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
  2020-11-10  6:31                       ` [PATCH v3 19/35] x86/io_apic: Cleanup trigger/polarity helpers Qian Cai
  2020-10-24 21:35                     ` [PATCH v3 20/35] x86/ioapic: Cleanup IO/APIC route entry structs David Woodhouse
                                       ` (16 subsequent siblings)
  35 siblings, 2 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: Thomas Gleixner <tglx@linutronix.de>

'trigger' and 'polarity' are used throughout the I/O-APIC code for handling
the trigger type (edge/level) and the active low/high configuration. While
there are defines for initializing these variables and struct members, they
are not used consequently and the meaning of 'trigger' and 'polarity' is
opaque and confusing at best.

Rename them to 'is_level' and 'active_low' and make them boolean in various
structs so it's entirely clear what the meaning is.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/hw_irq.h       |   6 +-
 arch/x86/kernel/apic/io_apic.c      | 244 +++++++++++++---------------
 arch/x86/pci/intel_mid_pci.c        |   8 +-
 drivers/iommu/amd/iommu.c           |  10 +-
 drivers/iommu/intel/irq_remapping.c |   9 +-
 5 files changed, 130 insertions(+), 147 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index a4aeeaace040..517847a94dbe 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -47,9 +47,9 @@ enum irq_alloc_type {
 struct ioapic_alloc_info {
 	int				pin;
 	int				node;
-	u32				trigger : 1;
-	u32				polarity : 1;
-	u32				valid : 1;
+	u32				is_level	: 1;
+	u32				active_low	: 1;
+	u32				valid		: 1;
 	struct IO_APIC_route_entry	*entry;
 };
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index c6d92d2570d0..24a7bba7cbf4 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -89,12 +89,12 @@ struct irq_pin_list {
 };
 
 struct mp_chip_data {
-	struct list_head irq_2_pin;
-	struct IO_APIC_route_entry entry;
-	int trigger;
-	int polarity;
+	struct list_head		irq_2_pin;
+	struct IO_APIC_route_entry	entry;
+	bool				is_level;
+	bool				active_low;
+	bool				isa_irq;
 	u32 count;
-	bool isa_irq;
 };
 
 struct mp_ioapic_gsi {
@@ -745,44 +745,7 @@ static int __init find_isa_irq_apic(int irq, int type)
 	return -1;
 }
 
-#ifdef CONFIG_EISA
-/*
- * EISA Edge/Level control register, ELCR
- */
-static int EISA_ELCR(unsigned int irq)
-{
-	if (irq < nr_legacy_irqs()) {
-		unsigned int port = 0x4d0 + (irq >> 3);
-		return (inb(port) >> (irq & 7)) & 1;
-	}
-	apic_printk(APIC_VERBOSE, KERN_INFO
-			"Broken MPtable reports ISA irq %d\n", irq);
-	return 0;
-}
-
-#endif
-
-/* ISA interrupts are always active high edge triggered,
- * when listed as conforming in the MP table. */
-
-#define default_ISA_trigger(idx)	(IOAPIC_EDGE)
-#define default_ISA_polarity(idx)	(IOAPIC_POL_HIGH)
-
-/* EISA interrupts are always polarity zero and can be edge or level
- * trigger depending on the ELCR value.  If an interrupt is listed as
- * EISA conforming in the MP table, that means its trigger type must
- * be read in from the ELCR */
-
-#define default_EISA_trigger(idx)	(EISA_ELCR(mp_irqs[idx].srcbusirq))
-#define default_EISA_polarity(idx)	default_ISA_polarity(idx)
-
-/* PCI interrupts are always active low level triggered,
- * when listed as conforming in the MP table. */
-
-#define default_PCI_trigger(idx)	(IOAPIC_LEVEL)
-#define default_PCI_polarity(idx)	(IOAPIC_POL_LOW)
-
-static int irq_polarity(int idx)
+static bool irq_active_low(int idx)
 {
 	int bus = mp_irqs[idx].srcbus;
 
@@ -791,90 +754,139 @@ static int irq_polarity(int idx)
 	 */
 	switch (mp_irqs[idx].irqflag & MP_IRQPOL_MASK) {
 	case MP_IRQPOL_DEFAULT:
-		/* conforms to spec, ie. bus-type dependent polarity */
-		if (test_bit(bus, mp_bus_not_pci))
-			return default_ISA_polarity(idx);
-		else
-			return default_PCI_polarity(idx);
+		/*
+		 * Conforms to spec, ie. bus-type dependent polarity.  PCI
+		 * defaults to low active. [E]ISA defaults to high active.
+		 */
+		return !test_bit(bus, mp_bus_not_pci);
 	case MP_IRQPOL_ACTIVE_HIGH:
-		return IOAPIC_POL_HIGH;
+		return false;
 	case MP_IRQPOL_RESERVED:
 		pr_warn("IOAPIC: Invalid polarity: 2, defaulting to low\n");
 		fallthrough;
 	case MP_IRQPOL_ACTIVE_LOW:
 	default: /* Pointless default required due to do gcc stupidity */
-		return IOAPIC_POL_LOW;
+		return true;
 	}
 }
 
 #ifdef CONFIG_EISA
-static int eisa_irq_trigger(int idx, int bus, int trigger)
+/*
+ * EISA Edge/Level control register, ELCR
+ */
+static bool EISA_ELCR(unsigned int irq)
+{
+	if (irq < nr_legacy_irqs()) {
+		unsigned int port = 0x4d0 + (irq >> 3);
+		return (inb(port) >> (irq & 7)) & 1;
+	}
+	apic_printk(APIC_VERBOSE, KERN_INFO
+			"Broken MPtable reports ISA irq %d\n", irq);
+	return false;
+}
+
+/*
+ * EISA interrupts are always active high and can be edge or level
+ * triggered depending on the ELCR value.  If an interrupt is listed as
+ * EISA conforming in the MP table, that means its trigger type must be
+ * read in from the ELCR.
+ */
+static bool eisa_irq_is_level(int idx, int bus, bool level)
 {
 	switch (mp_bus_id_to_type[bus]) {
 	case MP_BUS_PCI:
 	case MP_BUS_ISA:
-		return trigger;
+		return level;
 	case MP_BUS_EISA:
-		return default_EISA_trigger(idx);
+		return EISA_ELCR(mp_irqs[idx].srcbusirq);
 	}
 	pr_warn("IOAPIC: Invalid srcbus: %d defaulting to level\n", bus);
-	return IOAPIC_LEVEL;
+	return true;
 }
 #else
-static inline int eisa_irq_trigger(int idx, int bus, int trigger)
+static inline int eisa_irq_is_level(int idx, int bus, bool level)
 {
-	return trigger;
+	return level;
 }
 #endif
 
-static int irq_trigger(int idx)
+static bool irq_is_level(int idx)
 {
 	int bus = mp_irqs[idx].srcbus;
-	int trigger;
+	bool level;
 
 	/*
 	 * Determine IRQ trigger mode (edge or level sensitive):
 	 */
 	switch (mp_irqs[idx].irqflag & MP_IRQTRIG_MASK) {
 	case MP_IRQTRIG_DEFAULT:
-		/* conforms to spec, ie. bus-type dependent trigger mode */
-		if (test_bit(bus, mp_bus_not_pci))
-			trigger = default_ISA_trigger(idx);
-		else
-			trigger = default_PCI_trigger(idx);
+		/*
+		 * Conforms to spec, ie. bus-type dependent trigger
+		 * mode. PCI defaults to egde, ISA to level.
+		 */
+		level = test_bit(bus, mp_bus_not_pci);
 		/* Take EISA into account */
-		return eisa_irq_trigger(idx, bus, trigger);
+		return eisa_irq_is_level(idx, bus, level);
 	case MP_IRQTRIG_EDGE:
-		return IOAPIC_EDGE;
+		return false;
 	case MP_IRQTRIG_RESERVED:
 		pr_warn("IOAPIC: Invalid trigger mode 2 defaulting to level\n");
 		fallthrough;
 	case MP_IRQTRIG_LEVEL:
 	default: /* Pointless default required due to do gcc stupidity */
-		return IOAPIC_LEVEL;
+		return true;
 	}
 }
 
+static int __acpi_get_override_irq(u32 gsi, bool *trigger, bool *polarity)
+{
+	int ioapic, pin, idx;
+
+	if (skip_ioapic_setup)
+		return -1;
+
+	ioapic = mp_find_ioapic(gsi);
+	if (ioapic < 0)
+		return -1;
+
+	pin = mp_find_ioapic_pin(ioapic, gsi);
+	if (pin < 0)
+		return -1;
+
+	idx = find_irq_entry(ioapic, pin, mp_INT);
+	if (idx < 0)
+		return -1;
+
+	*trigger = irq_is_level(idx);
+	*polarity = irq_active_low(idx);
+	return 0;
+}
+
+#ifdef CONFIG_ACPI
+int acpi_get_override_irq(u32 gsi, int *is_level, int *active_low)
+{
+	*is_level = *active_low = 0;
+	return __acpi_get_override_irq(gsi, (bool *)is_level,
+				       (bool *)active_low);
+}
+#endif
+
 void ioapic_set_alloc_attr(struct irq_alloc_info *info, int node,
 			   int trigger, int polarity)
 {
 	init_irq_alloc_info(info, NULL);
 	info->type = X86_IRQ_ALLOC_TYPE_IOAPIC;
 	info->ioapic.node = node;
-	info->ioapic.trigger = trigger;
-	info->ioapic.polarity = polarity;
+	info->ioapic.is_level = trigger;
+	info->ioapic.active_low = polarity;
 	info->ioapic.valid = 1;
 }
 
-#ifndef CONFIG_ACPI
-int acpi_get_override_irq(u32 gsi, int *trigger, int *polarity);
-#endif
-
 static void ioapic_copy_alloc_attr(struct irq_alloc_info *dst,
 				   struct irq_alloc_info *src,
 				   u32 gsi, int ioapic_idx, int pin)
 {
-	int trigger, polarity;
+	bool level, pol_low;
 
 	copy_irq_alloc_info(dst, src);
 	dst->type = X86_IRQ_ALLOC_TYPE_IOAPIC;
@@ -883,20 +895,20 @@ static void ioapic_copy_alloc_attr(struct irq_alloc_info *dst,
 	dst->ioapic.valid = 1;
 	if (src && src->ioapic.valid) {
 		dst->ioapic.node = src->ioapic.node;
-		dst->ioapic.trigger = src->ioapic.trigger;
-		dst->ioapic.polarity = src->ioapic.polarity;
+		dst->ioapic.is_level = src->ioapic.is_level;
+		dst->ioapic.active_low = src->ioapic.active_low;
 	} else {
 		dst->ioapic.node = NUMA_NO_NODE;
-		if (acpi_get_override_irq(gsi, &trigger, &polarity) >= 0) {
-			dst->ioapic.trigger = trigger;
-			dst->ioapic.polarity = polarity;
+		if (__acpi_get_override_irq(gsi, &level, &pol_low) >= 0) {
+			dst->ioapic.is_level = level;
+			dst->ioapic.active_low = pol_low;
 		} else {
 			/*
 			 * PCI interrupts are always active low level
 			 * triggered.
 			 */
-			dst->ioapic.trigger = IOAPIC_LEVEL;
-			dst->ioapic.polarity = IOAPIC_POL_LOW;
+			dst->ioapic.is_level = true;
+			dst->ioapic.active_low = true;
 		}
 	}
 }
@@ -906,12 +918,12 @@ static int ioapic_alloc_attr_node(struct irq_alloc_info *info)
 	return (info && info->ioapic.valid) ? info->ioapic.node : NUMA_NO_NODE;
 }
 
-static void mp_register_handler(unsigned int irq, unsigned long trigger)
+static void mp_register_handler(unsigned int irq, bool level)
 {
 	irq_flow_handler_t hdl;
 	bool fasteoi;
 
-	if (trigger) {
+	if (level) {
 		irq_set_status_flags(irq, IRQ_LEVEL);
 		fasteoi = true;
 	} else {
@@ -933,14 +945,14 @@ static bool mp_check_pin_attr(int irq, struct irq_alloc_info *info)
 	 * pin with real trigger and polarity attributes.
 	 */
 	if (irq < nr_legacy_irqs() && data->count == 1) {
-		if (info->ioapic.trigger != data->trigger)
-			mp_register_handler(irq, info->ioapic.trigger);
-		data->entry.trigger = data->trigger = info->ioapic.trigger;
-		data->entry.polarity = data->polarity = info->ioapic.polarity;
+		if (info->ioapic.is_level != data->is_level)
+			mp_register_handler(irq, info->ioapic.is_level);
+		data->entry.trigger = data->is_level = info->ioapic.is_level;
+		data->entry.polarity = data->active_low = info->ioapic.active_low;
 	}
 
-	return data->trigger == info->ioapic.trigger &&
-	       data->polarity == info->ioapic.polarity;
+	return data->is_level == info->ioapic.is_level &&
+	       data->active_low == info->ioapic.active_low;
 }
 
 static int alloc_irq_from_domain(struct irq_domain *domain, int ioapic, u32 gsi,
@@ -2179,9 +2191,9 @@ static inline void __init check_timer(void)
 			 * so only need to unmask if it is level-trigger
 			 * do we really have level trigger timer?
 			 */
-			int idx;
-			idx = find_irq_entry(apic1, pin1, mp_INT);
-			if (idx != -1 && irq_trigger(idx))
+			int idx = find_irq_entry(apic1, pin1, mp_INT);
+
+			if (idx != -1 && irq_is_level(idx))
 				unmask_ioapic_irq(irq_get_irq_data(0));
 		}
 		irq_domain_deactivate_irq(irq_data);
@@ -2588,30 +2600,6 @@ static int io_apic_get_version(int ioapic)
 	return reg_01.bits.version;
 }
 
-int acpi_get_override_irq(u32 gsi, int *trigger, int *polarity)
-{
-	int ioapic, pin, idx;
-
-	if (skip_ioapic_setup)
-		return -1;
-
-	ioapic = mp_find_ioapic(gsi);
-	if (ioapic < 0)
-		return -1;
-
-	pin = mp_find_ioapic_pin(ioapic, gsi);
-	if (pin < 0)
-		return -1;
-
-	idx = find_irq_entry(ioapic, pin, mp_INT);
-	if (idx < 0)
-		return -1;
-
-	*trigger = irq_trigger(idx);
-	*polarity = irq_polarity(idx);
-	return 0;
-}
-
 /*
  * This function updates target affinity of IOAPIC interrupts to include
  * the CPUs which came online during SMP bringup.
@@ -2935,13 +2923,13 @@ static void mp_irqdomain_get_attr(u32 gsi, struct mp_chip_data *data,
 				  struct irq_alloc_info *info)
 {
 	if (info && info->ioapic.valid) {
-		data->trigger = info->ioapic.trigger;
-		data->polarity = info->ioapic.polarity;
-	} else if (acpi_get_override_irq(gsi, &data->trigger,
-					 &data->polarity) < 0) {
+		data->is_level = info->ioapic.is_level;
+		data->active_low = info->ioapic.active_low;
+	} else if (__acpi_get_override_irq(gsi, &data->is_level,
+					   &data->active_low) < 0) {
 		/* PCI interrupts are always active low level triggered. */
-		data->trigger = IOAPIC_LEVEL;
-		data->polarity = IOAPIC_POL_LOW;
+		data->is_level = true;
+		data->active_low = true;
 	}
 }
 
@@ -2953,16 +2941,13 @@ static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
 	entry->dest_mode     = apic->dest_mode_logical;
 	entry->dest	     = cfg->dest_apicid;
 	entry->vector	     = cfg->vector;
-	entry->trigger	     = data->trigger;
-	entry->polarity	     = data->polarity;
+	entry->trigger	     = data->is_level;
+	entry->polarity	     = data->active_low;
 	/*
 	 * Mask level triggered irqs. Edge triggered irqs are masked
 	 * by the irq core code in case they fire.
 	 */
-	if (data->trigger == IOAPIC_LEVEL)
-		entry->mask = IOAPIC_MASKED;
-	else
-		entry->mask = IOAPIC_UNMASKED;
+	entry->mask = data->is_level;
 }
 
 int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
@@ -3010,7 +2995,7 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
 	local_irq_save(flags);
 	if (info->ioapic.entry)
 		mp_setup_entry(cfg, data, info->ioapic.entry);
-	mp_register_handler(virq, data->trigger);
+	mp_register_handler(virq, data->is_level);
 	if (virq < nr_legacy_irqs())
 		legacy_pic->mask(virq);
 	local_irq_restore(flags);
@@ -3018,7 +3003,8 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
 	apic_printk(APIC_VERBOSE, KERN_DEBUG
 		    "IOAPIC[%d]: Set routing entry (%d-%d -> 0x%x -> IRQ %d Mode:%i Active:%i Dest:%d)\n",
 		    ioapic, mpc_ioapic_id(ioapic), pin, cfg->vector,
-		    virq, data->trigger, data->polarity, cfg->dest_apicid);
+		    virq, data->is_level, data->active_low,
+		    cfg->dest_apicid);
 
 	return 0;
 }
diff --git a/arch/x86/pci/intel_mid_pci.c b/arch/x86/pci/intel_mid_pci.c
index 00c62115f39c..3709afae7c77 100644
--- a/arch/x86/pci/intel_mid_pci.c
+++ b/arch/x86/pci/intel_mid_pci.c
@@ -214,7 +214,7 @@ static int pci_write(struct pci_bus *bus, unsigned int devfn, int where,
 static int intel_mid_pci_irq_enable(struct pci_dev *dev)
 {
 	struct irq_alloc_info info;
-	int polarity;
+	bool polarity_low;
 	int ret;
 	u8 gsi;
 
@@ -229,7 +229,7 @@ static int intel_mid_pci_irq_enable(struct pci_dev *dev)
 
 	switch (intel_mid_identify_cpu()) {
 	case INTEL_MID_CPU_CHIP_TANGIER:
-		polarity = IOAPIC_POL_HIGH;
+		polarity_low = false;
 
 		/* Special treatment for IRQ0 */
 		if (gsi == 0) {
@@ -251,11 +251,11 @@ static int intel_mid_pci_irq_enable(struct pci_dev *dev)
 		}
 		break;
 	default:
-		polarity = IOAPIC_POL_LOW;
+		polarity_low = true;
 		break;
 	}
 
-	ioapic_set_alloc_attr(&info, dev_to_node(&dev->dev), 1, polarity);
+	ioapic_set_alloc_attr(&info, dev_to_node(&dev->dev), 1, polarity_low);
 
 	/*
 	 * MRST only have IOAPIC, the PCI irq lines are 1:1 mapped to
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 473de0920b64..b0e5210e53b2 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3687,13 +3687,11 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 		entry = info->ioapic.entry;
 		info->ioapic.entry = NULL;
 		memset(entry, 0, sizeof(*entry));
-		entry->vector        = index;
-		entry->mask          = 0;
-		entry->trigger       = info->ioapic.trigger;
-		entry->polarity      = info->ioapic.polarity;
+		entry->vector	= index;
+		entry->trigger	= info->ioapic.is_level;
+		entry->polarity	= info->ioapic.active_low;
 		/* Mask level triggered irqs. */
-		if (info->ioapic.trigger)
-			entry->mask = 1;
+		entry->mask	= info->ioapic.is_level;
 		break;
 
 	case X86_IRQ_ALLOC_TYPE_HPET:
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 30269b738fa5..54ca69333445 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1306,11 +1306,10 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 		 * irq handler will do the explicit EOI to the io-apic.
 		 */
 		entry->vector	= info->ioapic.pin;
-		entry->mask	= 0;			/* enable IRQ */
-		entry->trigger	= info->ioapic.trigger;
-		entry->polarity	= info->ioapic.polarity;
-		if (info->ioapic.trigger)
-			entry->mask = 1; /* Mask level triggered irqs. */
+		entry->trigger	= info->ioapic.is_level;
+		entry->polarity	= info->ioapic.active_low;
+		/* Mask level triggered irqs. */
+		entry->mask	= info->ioapic.is_level;
 		break;
 
 	case X86_IRQ_ALLOC_TYPE_HPET:
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 20/35] x86/ioapic: Cleanup IO/APIC route entry structs
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (18 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 19/35] x86/io_apic: Cleanup trigger/polarity helpers David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
  2020-10-24 21:35                     ` [PATCH v3 21/35] x86/ioapic: Generate RTE directly from parent irqchip's MSI message David Woodhouse
                                       ` (15 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: Thomas Gleixner <tglx@linutronix.de>

Having two seperate structs for the I/O-APIC RTE entries (non-remapped and
DMAR remapped) requires type casts and makes it hard to map.

Combine them in IO_APIC_routing_entry by defining a union of two 64bit
bitfields. Use naming which reflects which bits are shared and which bits
are actually different for the operating modes.

[dwmw2: Fix it up and finish the job, pulling the 32-bit w1,w2 words for
        register access into the same union and eliminating a few more
        places where bits were accessed through masks and shifts.]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/io_apic.h      |  78 ++++++---------
 arch/x86/kernel/apic/io_apic.c      | 144 +++++++++++++---------------
 drivers/iommu/amd/iommu.c           |   8 +-
 drivers/iommu/hyperv-iommu.c        |   4 +-
 drivers/iommu/intel/irq_remapping.c |  19 ++--
 5 files changed, 108 insertions(+), 145 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index a1a26f6d3aa4..73da644b2f0d 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -13,15 +13,6 @@
  * Copyright (C) 1997, 1998, 1999, 2000 Ingo Molnar
  */
 
-/* I/O Unit Redirection Table */
-#define IO_APIC_REDIR_VECTOR_MASK	0x000FF
-#define IO_APIC_REDIR_DEST_LOGICAL	0x00800
-#define IO_APIC_REDIR_DEST_PHYSICAL	0x00000
-#define IO_APIC_REDIR_SEND_PENDING	(1 << 12)
-#define IO_APIC_REDIR_REMOTE_IRR	(1 << 14)
-#define IO_APIC_REDIR_LEVEL_TRIGGER	(1 << 15)
-#define IO_APIC_REDIR_MASKED		(1 << 16)
-
 /*
  * The structure of the IO-APIC:
  */
@@ -65,52 +56,39 @@ union IO_APIC_reg_03 {
 };
 
 struct IO_APIC_route_entry {
-	__u32	vector		:  8,
-		delivery_mode	:  3,	/* 000: FIXED
-					 * 001: lowest prio
-					 * 111: ExtINT
-					 */
-		dest_mode	:  1,	/* 0: physical, 1: logical */
-		delivery_status	:  1,
-		polarity	:  1,
-		irr		:  1,
-		trigger		:  1,	/* 0: edge, 1: level */
-		mask		:  1,	/* 0: enabled, 1: disabled */
-		__reserved_2	: 15;
-
-	__u32	__reserved_3	: 24,
-		dest		:  8;
-} __attribute__ ((packed));
-
-struct IR_IO_APIC_route_entry {
-	__u64	vector		: 8,
-		zero		: 3,
-		index2		: 1,
-		delivery_status : 1,
-		polarity	: 1,
-		irr		: 1,
-		trigger		: 1,
-		mask		: 1,
-		reserved	: 31,
-		format		: 1,
-		index		: 15;
+	union {
+		struct {
+			u64	vector			:  8,
+				delivery_mode		:  3,
+				dest_mode_logical	:  1,
+				delivery_status		:  1,
+				active_low		:  1,
+				irr			:  1,
+				is_level		:  1,
+				masked			:  1,
+				reserved_0		: 15,
+				reserved_1		: 24,
+				destid_0_7		:  8;
+		};
+		struct {
+			u64	ir_shared_0		:  8,
+				ir_zero			:  3,
+				ir_index_15		:  1,
+				ir_shared_1		:  5,
+				ir_reserved_0		: 31,
+				ir_format		:  1,
+				ir_index_0_14		: 15;
+		};
+		struct {
+			u64	w1			: 32,
+				w2			: 32;
+		};
+	};
 } __attribute__ ((packed));
 
 struct irq_alloc_info;
 struct ioapic_domain_cfg;
 
-#define IOAPIC_EDGE			0
-#define IOAPIC_LEVEL			1
-
-#define IOAPIC_MASKED			1
-#define IOAPIC_UNMASKED			0
-
-#define IOAPIC_POL_HIGH			0
-#define IOAPIC_POL_LOW			1
-
-#define IOAPIC_DEST_MODE_PHYSICAL	0
-#define IOAPIC_DEST_MODE_LOGICAL	1
-
 #define	IOAPIC_MAP_ALLOC		0x1
 #define	IOAPIC_MAP_CHECK		0x2
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 24a7bba7cbf4..07e754131854 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -286,31 +286,26 @@ static void io_apic_write(unsigned int apic, unsigned int reg,
 	writel(value, &io_apic->data);
 }
 
-union entry_union {
-	struct { u32 w1, w2; };
-	struct IO_APIC_route_entry entry;
-};
-
 static struct IO_APIC_route_entry __ioapic_read_entry(int apic, int pin)
 {
-	union entry_union eu;
+	struct IO_APIC_route_entry entry;
 
-	eu.w1 = io_apic_read(apic, 0x10 + 2 * pin);
-	eu.w2 = io_apic_read(apic, 0x11 + 2 * pin);
+	entry.w1 = io_apic_read(apic, 0x10 + 2 * pin);
+	entry.w2 = io_apic_read(apic, 0x11 + 2 * pin);
 
-	return eu.entry;
+	return entry;
 }
 
 static struct IO_APIC_route_entry ioapic_read_entry(int apic, int pin)
 {
-	union entry_union eu;
+	struct IO_APIC_route_entry entry;
 	unsigned long flags;
 
 	raw_spin_lock_irqsave(&ioapic_lock, flags);
-	eu.entry = __ioapic_read_entry(apic, pin);
+	entry = __ioapic_read_entry(apic, pin);
 	raw_spin_unlock_irqrestore(&ioapic_lock, flags);
 
-	return eu.entry;
+	return entry;
 }
 
 /*
@@ -321,11 +316,8 @@ static struct IO_APIC_route_entry ioapic_read_entry(int apic, int pin)
  */
 static void __ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e)
 {
-	union entry_union eu = {{0, 0}};
-
-	eu.entry = e;
-	io_apic_write(apic, 0x11 + 2*pin, eu.w2);
-	io_apic_write(apic, 0x10 + 2*pin, eu.w1);
+	io_apic_write(apic, 0x11 + 2*pin, e.w2);
+	io_apic_write(apic, 0x10 + 2*pin, e.w1);
 }
 
 static void ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e)
@@ -344,12 +336,12 @@ static void ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e)
  */
 static void ioapic_mask_entry(int apic, int pin)
 {
+	struct IO_APIC_route_entry e = { .masked = true };
 	unsigned long flags;
-	union entry_union eu = { .entry.mask = IOAPIC_MASKED };
 
 	raw_spin_lock_irqsave(&ioapic_lock, flags);
-	io_apic_write(apic, 0x10 + 2*pin, eu.w1);
-	io_apic_write(apic, 0x11 + 2*pin, eu.w2);
+	io_apic_write(apic, 0x10 + 2*pin, e.w1);
+	io_apic_write(apic, 0x11 + 2*pin, e.w2);
 	raw_spin_unlock_irqrestore(&ioapic_lock, flags);
 }
 
@@ -422,20 +414,15 @@ static void __init replace_pin_at_irq_node(struct mp_chip_data *data, int node,
 	add_pin_to_irq_node(data, node, newapic, newpin);
 }
 
-static void io_apic_modify_irq(struct mp_chip_data *data,
-			       int mask_and, int mask_or,
+static void io_apic_modify_irq(struct mp_chip_data *data, bool masked,
 			       void (*final)(struct irq_pin_list *entry))
 {
-	union entry_union eu;
 	struct irq_pin_list *entry;
 
-	eu.entry = data->entry;
-	eu.w1 &= mask_and;
-	eu.w1 |= mask_or;
-	data->entry = eu.entry;
+	data->entry.masked = masked;
 
 	for_each_irq_pin(entry, data->irq_2_pin) {
-		io_apic_write(entry->apic, 0x10 + 2 * entry->pin, eu.w1);
+		io_apic_write(entry->apic, 0x10 + 2 * entry->pin, data->entry.w1);
 		if (final)
 			final(entry);
 	}
@@ -459,13 +446,13 @@ static void mask_ioapic_irq(struct irq_data *irq_data)
 	unsigned long flags;
 
 	raw_spin_lock_irqsave(&ioapic_lock, flags);
-	io_apic_modify_irq(data, ~0, IO_APIC_REDIR_MASKED, &io_apic_sync);
+	io_apic_modify_irq(data, true, &io_apic_sync);
 	raw_spin_unlock_irqrestore(&ioapic_lock, flags);
 }
 
 static void __unmask_ioapic(struct mp_chip_data *data)
 {
-	io_apic_modify_irq(data, ~IO_APIC_REDIR_MASKED, 0, NULL);
+	io_apic_modify_irq(data, false, NULL);
 }
 
 static void unmask_ioapic_irq(struct irq_data *irq_data)
@@ -506,8 +493,8 @@ static void __eoi_ioapic_pin(int apic, int pin, int vector)
 		/*
 		 * Mask the entry and change the trigger mode to edge.
 		 */
-		entry1.mask = IOAPIC_MASKED;
-		entry1.trigger = IOAPIC_EDGE;
+		entry1.masked = true;
+		entry1.is_level = false;
 
 		__ioapic_write_entry(apic, pin, entry1);
 
@@ -542,8 +529,8 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin)
 	 * Make sure the entry is masked and re-read the contents to check
 	 * if it is a level triggered pin and if the remote-IRR is set.
 	 */
-	if (entry.mask == IOAPIC_UNMASKED) {
-		entry.mask = IOAPIC_MASKED;
+	if (!entry.masked) {
+		entry.masked = true;
 		ioapic_write_entry(apic, pin, entry);
 		entry = ioapic_read_entry(apic, pin);
 	}
@@ -556,8 +543,8 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin)
 		 * doesn't clear the remote-IRR if the trigger mode is not
 		 * set to level.
 		 */
-		if (entry.trigger == IOAPIC_EDGE) {
-			entry.trigger = IOAPIC_LEVEL;
+		if (!entry.is_level) {
+			entry.is_level = true;
 			ioapic_write_entry(apic, pin, entry);
 		}
 		raw_spin_lock_irqsave(&ioapic_lock, flags);
@@ -659,8 +646,8 @@ void mask_ioapic_entries(void)
 			struct IO_APIC_route_entry entry;
 
 			entry = ioapics[apic].saved_registers[pin];
-			if (entry.mask == IOAPIC_UNMASKED) {
-				entry.mask = IOAPIC_MASKED;
+			if (!entry.masked) {
+				entry.masked = true;
 				ioapic_write_entry(apic, pin, entry);
 			}
 		}
@@ -947,8 +934,8 @@ static bool mp_check_pin_attr(int irq, struct irq_alloc_info *info)
 	if (irq < nr_legacy_irqs() && data->count == 1) {
 		if (info->ioapic.is_level != data->is_level)
 			mp_register_handler(irq, info->ioapic.is_level);
-		data->entry.trigger = data->is_level = info->ioapic.is_level;
-		data->entry.polarity = data->active_low = info->ioapic.active_low;
+		data->entry.is_level = data->is_level = info->ioapic.is_level;
+		data->entry.active_low = data->active_low = info->ioapic.active_low;
 	}
 
 	return data->is_level == info->ioapic.is_level &&
@@ -1231,10 +1218,9 @@ void ioapic_zap_locks(void)
 
 static void io_apic_print_entries(unsigned int apic, unsigned int nr_entries)
 {
-	int i;
-	char buf[256];
 	struct IO_APIC_route_entry entry;
-	struct IR_IO_APIC_route_entry *ir_entry = (void *)&entry;
+	char buf[256];
+	int i;
 
 	printk(KERN_DEBUG "IOAPIC %d:\n", apic);
 	for (i = 0; i <= nr_entries; i++) {
@@ -1242,20 +1228,20 @@ static void io_apic_print_entries(unsigned int apic, unsigned int nr_entries)
 		snprintf(buf, sizeof(buf),
 			 " pin%02x, %s, %s, %s, V(%02X), IRR(%1d), S(%1d)",
 			 i,
-			 entry.mask == IOAPIC_MASKED ? "disabled" : "enabled ",
-			 entry.trigger == IOAPIC_LEVEL ? "level" : "edge ",
-			 entry.polarity == IOAPIC_POL_LOW ? "low " : "high",
+			 entry.masked ? "disabled" : "enabled ",
+			 entry.is_level ? "level" : "edge ",
+			 entry.active_low ? "low " : "high",
 			 entry.vector, entry.irr, entry.delivery_status);
-		if (ir_entry->format)
+		if (entry.ir_format) {
 			printk(KERN_DEBUG "%s, remapped, I(%04X),  Z(%X)\n",
-			       buf, (ir_entry->index2 << 15) | ir_entry->index,
-			       ir_entry->zero);
-		else
-			printk(KERN_DEBUG "%s, %s, D(%02X), M(%1d)\n",
 			       buf,
-			       entry.dest_mode == IOAPIC_DEST_MODE_LOGICAL ?
-			       "logical " : "physical",
-			       entry.dest, entry.delivery_mode);
+			       (entry.ir_index_15 << 15) | entry.ir_index_0_14,
+				entry.ir_zero);
+		} else {
+			printk(KERN_DEBUG "%s, %s, D(%02X), M(%1d)\n", buf,
+			       entry.dest_mode_logical ? "logical " : "physical",
+			       entry.destid_0_7, entry.delivery_mode);
+		}
 	}
 }
 
@@ -1380,8 +1366,8 @@ void __init enable_IO_APIC(void)
 		/* If the interrupt line is enabled and in ExtInt mode
 		 * I have found the pin where the i8259 is connected.
 		 */
-		if ((entry.mask == 0) &&
-		    (entry.delivery_mode == APIC_DELIVERY_MODE_EXTINT)) {
+		if (!entry.masked &&
+		    entry.delivery_mode == APIC_DELIVERY_MODE_EXTINT) {
 			ioapic_i8259.apic = apic;
 			ioapic_i8259.pin  = pin;
 			goto found_i8259;
@@ -1425,12 +1411,12 @@ void native_restore_boot_irq_mode(void)
 		struct IO_APIC_route_entry entry;
 
 		memset(&entry, 0, sizeof(entry));
-		entry.mask		= IOAPIC_UNMASKED;
-		entry.trigger		= IOAPIC_EDGE;
-		entry.polarity		= IOAPIC_POL_HIGH;
-		entry.dest_mode		= IOAPIC_DEST_MODE_PHYSICAL;
+		entry.masked		= false;
+		entry.is_level		= false;
+		entry.active_low	= false;
+		entry.dest_mode_logical	= false;
 		entry.delivery_mode	= APIC_DELIVERY_MODE_EXTINT;
-		entry.dest		= read_apic_id();
+		entry.destid_0_7	= read_apic_id();
 
 		/*
 		 * Add it to the IO-APIC irq-routing table:
@@ -1709,13 +1695,13 @@ static bool io_apic_level_ack_pending(struct mp_chip_data *data)
 
 	raw_spin_lock_irqsave(&ioapic_lock, flags);
 	for_each_irq_pin(entry, data->irq_2_pin) {
-		unsigned int reg;
+		struct IO_APIC_route_entry e;
 		int pin;
 
 		pin = entry->pin;
-		reg = io_apic_read(entry->apic, 0x10 + pin*2);
+		e.w1 = io_apic_read(entry->apic, 0x10 + pin*2);
 		/* Is the remote IRR bit set? */
-		if (reg & IO_APIC_REDIR_REMOTE_IRR) {
+		if (e.irr) {
 			raw_spin_unlock_irqrestore(&ioapic_lock, flags);
 			return true;
 		}
@@ -1874,7 +1860,7 @@ static void ioapic_configure_entry(struct irq_data *irqd)
 	 * ioapic chip to verify that.
 	 */
 	if (irqd->chip == &ioapic_chip) {
-		mpd->entry.dest = cfg->dest_apicid;
+		mpd->entry.destid_0_7 = cfg->dest_apicid;
 		mpd->entry.vector = cfg->vector;
 	}
 	for_each_irq_pin(entry, mpd->irq_2_pin)
@@ -1932,7 +1918,7 @@ static int ioapic_irq_get_chip_state(struct irq_data *irqd,
 		 * irrelevant because the IO-APIC treats them as fire and
 		 * forget.
 		 */
-		if (rentry.irr && rentry.trigger) {
+		if (rentry.irr && rentry.is_level) {
 			*state = true;
 			break;
 		}
@@ -2057,12 +2043,12 @@ static inline void __init unlock_ExtINT_logic(void)
 
 	memset(&entry1, 0, sizeof(entry1));
 
-	entry1.dest_mode = IOAPIC_DEST_MODE_PHYSICAL;
-	entry1.mask = IOAPIC_UNMASKED;
-	entry1.dest = hard_smp_processor_id();
-	entry1.delivery_mode = APIC_DELIVERY_MODE_EXTINT;
-	entry1.polarity = entry0.polarity;
-	entry1.trigger = IOAPIC_EDGE;
+	entry1.dest_mode_logical	= true;
+	entry1.masked			= false;
+	entry1.destid_0_7		= hard_smp_processor_id();
+	entry1.delivery_mode		= APIC_DELIVERY_MODE_EXTINT;
+	entry1.active_low		= entry0.active_low;
+	entry1.is_level			= false;
 	entry1.vector = 0;
 
 	ioapic_write_entry(apic, pin, entry1);
@@ -2937,17 +2923,17 @@ static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
 			   struct IO_APIC_route_entry *entry)
 {
 	memset(entry, 0, sizeof(*entry));
-	entry->delivery_mode = apic->delivery_mode;
-	entry->dest_mode     = apic->dest_mode_logical;
-	entry->dest	     = cfg->dest_apicid;
-	entry->vector	     = cfg->vector;
-	entry->trigger	     = data->is_level;
-	entry->polarity	     = data->active_low;
+	entry->delivery_mode	 = apic->delivery_mode;
+	entry->dest_mode_logical = apic->dest_mode_logical;
+	entry->destid_0_7	 = cfg->dest_apicid;
+	entry->vector		 = cfg->vector;
+	entry->is_level		 = data->is_level;
+	entry->active_low	 = data->active_low;
 	/*
 	 * Mask level triggered irqs. Edge triggered irqs are masked
 	 * by the irq core code in case they fire.
 	 */
-	entry->mask = data->is_level;
+	entry->masked		= data->is_level;
 }
 
 int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index b0e5210e53b2..3d72ec7bbbf8 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3687,11 +3687,11 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 		entry = info->ioapic.entry;
 		info->ioapic.entry = NULL;
 		memset(entry, 0, sizeof(*entry));
-		entry->vector	= index;
-		entry->trigger	= info->ioapic.is_level;
-		entry->polarity	= info->ioapic.active_low;
+		entry->vector		= index;
+		entry->is_level		= info->ioapic.is_level;
+		entry->active_low	= info->ioapic.active_low;
 		/* Mask level triggered irqs. */
-		entry->mask	= info->ioapic.is_level;
+		entry->masked		= info->ioapic.is_level;
 		break;
 
 	case X86_IRQ_ALLOC_TYPE_HPET:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e09e2d734c57..1ab7eb918a5c 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -52,7 +52,7 @@ static int hyperv_ir_set_affinity(struct irq_data *data,
 		return ret;
 
 	entry = data->chip_data;
-	entry->dest = cfg->dest_apicid;
+	entry->destid_0_7 = cfg->dest_apicid;
 	entry->vector = cfg->vector;
 	send_cleanup_vector(cfg);
 
@@ -125,7 +125,7 @@ static int hyperv_irq_remapping_activate(struct irq_domain *domain,
 	struct irq_cfg *cfg = irqd_cfg(irq_data);
 	struct IO_APIC_route_entry *entry = irq_data->chip_data;
 
-	entry->dest = cfg->dest_apicid;
+	entry->destid_0_7 = cfg->dest_apicid;
 	entry->vector = cfg->vector;
 
 	return 0;
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 54ca69333445..625bdb9f1627 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1279,8 +1279,8 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 					     struct irq_alloc_info *info,
 					     int index, int sub_handle)
 {
-	struct IR_IO_APIC_route_entry *entry;
 	struct irte *irte = &data->irte_entry;
+	struct IO_APIC_route_entry *entry;
 
 	prepare_irte(irte, irq_cfg->vector, irq_cfg->dest_apicid);
 	switch (info->type) {
@@ -1294,22 +1294,21 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 			irte->avail, irte->vector, irte->dest_id,
 			irte->sid, irte->sq, irte->svt);
 
-		entry = (struct IR_IO_APIC_route_entry *)info->ioapic.entry;
+		entry = info->ioapic.entry;
 		info->ioapic.entry = NULL;
 		memset(entry, 0, sizeof(*entry));
-		entry->index2	= (index >> 15) & 0x1;
-		entry->zero	= 0;
-		entry->format	= 1;
-		entry->index	= (index & 0x7fff);
+		entry->ir_index_15	= !!(index & 0x8000);
+		entry->ir_format	= true;
+		entry->ir_index_0_14	= index & 0x7fff;
 		/*
 		 * IO-APIC RTE will be configured with virtual vector.
 		 * irq handler will do the explicit EOI to the io-apic.
 		 */
-		entry->vector	= info->ioapic.pin;
-		entry->trigger	= info->ioapic.is_level;
-		entry->polarity	= info->ioapic.active_low;
+		entry->vector		= info->ioapic.pin;
+		entry->is_level		= info->ioapic.is_level;
+		entry->active_low	= info->ioapic.active_low;
 		/* Mask level triggered irqs. */
-		entry->mask	= info->ioapic.is_level;
+		entry->masked		= info->ioapic.is_level;
 		break;
 
 	case X86_IRQ_ALLOC_TYPE_HPET:
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 21/35] x86/ioapic: Generate RTE directly from parent irqchip's MSI message
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (19 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 20/35] x86/ioapic: Cleanup IO/APIC route entry structs David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
  2020-10-24 21:35                     ` [PATCH v3 22/35] genirq/irqdomain: Implement get_name() method on irqchip fwnodes David Woodhouse
                                       ` (14 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: David Woodhouse <dwmw@amazon.co.uk>

The I/O-APIC generates an MSI cycle with address/data bits taken from its
Redirection Table Entry in some combination which used to make sense, but
now is just a bunch of bits which get passed through in some seemingly
arbitrary order.

Instead of making IRQ remapping drivers directly frob the I/OA-PIC RTE, let
them just do their job and generate an MSI message. The bit swizzling to
turn that MSI message into the I/O-APIC's RTE is the same in all cases,
since it's a function of the I/O-APIC hardware. The IRQ remappers have no
real need to get involved with that.

The only slight caveat is that the I/OAPIC is interpreting some of those
fields too, and it does want the 'vector' field to be unique to make EOI
work. The AMD IOMMU happens to put its IRTE index in the bits that the
I/O-APIC thinks are the vector field, and accommodates this requirement by
reserving the first 32 indices for the I/O-APIC.  The Intel IOMMU doesn't
actually use the bits that the I/O-APIC thinks are the vector field, so it
fills in the 'pin' value there instead.

[ tglx: Replaced the unreadably macro maze with the cleaned up RTE/msi_msg
  	bitfields and added commentry to explain the mapping magic ]

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/hw_irq.h       |  11 ++-
 arch/x86/kernel/apic/io_apic.c      | 103 +++++++++++++++++++---------
 drivers/iommu/amd/iommu.c           |  12 ----
 drivers/iommu/hyperv-iommu.c        |  31 ---------
 drivers/iommu/intel/irq_remapping.c |  31 ++-------
 5 files changed, 83 insertions(+), 105 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 517847a94dbe..83a69f62637e 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -45,12 +45,11 @@ enum irq_alloc_type {
 };
 
 struct ioapic_alloc_info {
-	int				pin;
-	int				node;
-	u32				is_level	: 1;
-	u32				active_low	: 1;
-	u32				valid		: 1;
-	struct IO_APIC_route_entry	*entry;
+	int		pin;
+	int		node;
+	u32		is_level	: 1;
+	u32		active_low	: 1;
+	u32		valid		: 1;
 };
 
 struct uv_alloc_info {
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 07e754131854..ea983da1a57f 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -48,6 +48,7 @@
 #include <linux/jiffies.h>	/* time_after() */
 #include <linux/slab.h>
 #include <linux/memblock.h>
+#include <linux/msi.h>
 
 #include <asm/irqdomain.h>
 #include <asm/io.h>
@@ -63,7 +64,6 @@
 #include <asm/setup.h>
 #include <asm/irq_remapping.h>
 #include <asm/hw_irq.h>
-
 #include <asm/apic.h>
 
 #define	for_each_ioapic(idx)		\
@@ -1848,21 +1848,58 @@ static void ioapic_ir_ack_level(struct irq_data *irq_data)
 	eoi_ioapic_pin(data->entry.vector, data);
 }
 
+/*
+ * The I/OAPIC is just a device for generating MSI messages from legacy
+ * interrupt pins. Various fields of the RTE translate into bits of the
+ * resulting MSI which had a historical meaning.
+ *
+ * With interrupt remapping, many of those bits have different meanings
+ * in the underlying MSI, but the way that the I/OAPIC transforms them
+ * from its RTE to the MSI message is the same. This function allows
+ * the parent IRQ domain to compose the MSI message, then takes the
+ * relevant bits to put them in the appropriate places in the RTE in
+ * order to generate that message when the IRQ happens.
+ *
+ * The setup here relies on a preconfigured route entry (is_level,
+ * active_low, masked) because the parent domain is merely composing the
+ * generic message routing information which is used for the MSI.
+ */
+static void ioapic_setup_msg_from_msi(struct irq_data *irq_data,
+				      struct IO_APIC_route_entry *entry)
+{
+	struct msi_msg msg;
+
+	/* Let the parent domain compose the MSI message */
+	irq_chip_compose_msi_msg(irq_data, &msg);
+
+	/*
+	 * - Real vector
+	 * - DMAR/IR: 8bit subhandle (ioapic.pin)
+	 * - AMD/IR:  8bit IRTE index
+	 */
+	entry->vector			= msg.arch_data.vector;
+	/* Delivery mode (for DMAR/IR all 0) */
+	entry->delivery_mode		= msg.arch_data.delivery_mode;
+	/* Destination mode or DMAR/IR index bit 15 */
+	entry->dest_mode_logical	= msg.arch_addr_lo.dest_mode_logical;
+	/* DMAR/IR: 1, 0 for all other modes */
+	entry->ir_format		= msg.arch_addr_lo.dmar_format;
+	/*
+	 * DMAR/IR: index bit 0-14.
+	 *
+	 * All other modes have bit 0-6 of dmar_index_0_14 cleared and the
+	 * topmost 8 bits are destination id bit 0-7 (entry::destid_0_7).
+	 */
+	entry->ir_index_0_14		= msg.arch_addr_lo.dmar_index_0_14;
+}
+
 static void ioapic_configure_entry(struct irq_data *irqd)
 {
 	struct mp_chip_data *mpd = irqd->chip_data;
-	struct irq_cfg *cfg = irqd_cfg(irqd);
 	struct irq_pin_list *entry;
 
-	/*
-	 * Only update when the parent is the vector domain, don't touch it
-	 * if the parent is the remapping domain. Check the installed
-	 * ioapic chip to verify that.
-	 */
-	if (irqd->chip == &ioapic_chip) {
-		mpd->entry.destid_0_7 = cfg->dest_apicid;
-		mpd->entry.vector = cfg->vector;
-	}
+	ioapic_setup_msg_from_msi(irqd, &mpd->entry);
+
 	for_each_irq_pin(entry, mpd->irq_2_pin)
 		__ioapic_write_entry(entry->apic, entry->pin, mpd->entry);
 }
@@ -2919,14 +2956,23 @@ static void mp_irqdomain_get_attr(u32 gsi, struct mp_chip_data *data,
 	}
 }
 
-static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
-			   struct IO_APIC_route_entry *entry)
+/*
+ * Configure the I/O-APIC specific fields in the routing entry.
+ *
+ * This is important to setup the I/O-APIC specific bits (is_level,
+ * active_low, masked) because the underlying parent domain will only
+ * provide the routing information and is oblivious of the I/O-APIC
+ * specific bits.
+ *
+ * The entry is just preconfigured at this point and not written into the
+ * RTE. This happens later during activation which will fill in the actual
+ * routing information.
+ */
+static void mp_preconfigure_entry(struct mp_chip_data *data)
 {
+	struct IO_APIC_route_entry *entry = &data->entry;
+
 	memset(entry, 0, sizeof(*entry));
-	entry->delivery_mode	 = apic->delivery_mode;
-	entry->dest_mode_logical = apic->dest_mode_logical;
-	entry->destid_0_7	 = cfg->dest_apicid;
-	entry->vector		 = cfg->vector;
 	entry->is_level		 = data->is_level;
 	entry->active_low	 = data->active_low;
 	/*
@@ -2939,11 +2985,10 @@ static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
 int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
 		       unsigned int nr_irqs, void *arg)
 {
-	int ret, ioapic, pin;
-	struct irq_cfg *cfg;
-	struct irq_data *irq_data;
-	struct mp_chip_data *data;
 	struct irq_alloc_info *info = arg;
+	struct mp_chip_data *data;
+	struct irq_data *irq_data;
+	int ret, ioapic, pin;
 	unsigned long flags;
 
 	if (!info || nr_irqs > 1)
@@ -2961,7 +3006,6 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
 	if (!data)
 		return -ENOMEM;
 
-	info->ioapic.entry = &data->entry;
 	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, info);
 	if (ret < 0) {
 		kfree(data);
@@ -2975,23 +3019,20 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
 	irq_data->chip_data = data;
 	mp_irqdomain_get_attr(mp_pin_to_gsi(ioapic, pin), data, info);
 
-	cfg = irqd_cfg(irq_data);
 	add_pin_to_irq_node(data, ioapic_alloc_attr_node(info), ioapic, pin);
 
-	local_irq_save(flags);
-	if (info->ioapic.entry)
-		mp_setup_entry(cfg, data, info->ioapic.entry);
+	mp_preconfigure_entry(data);
 	mp_register_handler(virq, data->is_level);
+
+	local_irq_save(flags);
 	if (virq < nr_legacy_irqs())
 		legacy_pic->mask(virq);
 	local_irq_restore(flags);
 
 	apic_printk(APIC_VERBOSE, KERN_DEBUG
-		    "IOAPIC[%d]: Set routing entry (%d-%d -> 0x%x -> IRQ %d Mode:%i Active:%i Dest:%d)\n",
-		    ioapic, mpc_ioapic_id(ioapic), pin, cfg->vector,
-		    virq, data->is_level, data->active_low,
-		    cfg->dest_apicid);
-
+		    "IOAPIC[%d]: Preconfigured routing entry (%d-%d -> IRQ %d Level:%i ActiveLow:%i)\n",
+		    ioapic, mpc_ioapic_id(ioapic), pin, virq,
+		    data->is_level, data->active_low);
 	return 0;
 }
 
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 3d72ec7bbbf8..9744cdbefd88 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3669,7 +3669,6 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 				       int devid, int index, int sub_handle)
 {
 	struct irq_2_irte *irte_info = &data->irq_2_irte;
-	struct IO_APIC_route_entry *entry;
 	struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
 
 	if (!iommu)
@@ -3683,17 +3682,6 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC:
-		/* Setup IOAPIC entry */
-		entry = info->ioapic.entry;
-		info->ioapic.entry = NULL;
-		memset(entry, 0, sizeof(*entry));
-		entry->vector		= index;
-		entry->is_level		= info->ioapic.is_level;
-		entry->active_low	= info->ioapic.active_low;
-		/* Mask level triggered irqs. */
-		entry->masked		= info->ioapic.is_level;
-		break;
-
 	case X86_IRQ_ALLOC_TYPE_HPET:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 1ab7eb918a5c..37dd485a5640 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -40,7 +40,6 @@ static int hyperv_ir_set_affinity(struct irq_data *data,
 {
 	struct irq_data *parent = data->parent_data;
 	struct irq_cfg *cfg = irqd_cfg(data);
-	struct IO_APIC_route_entry *entry;
 	int ret;
 
 	/* Return error If new irq affinity is out of ioapic_max_cpumask. */
@@ -51,9 +50,6 @@ static int hyperv_ir_set_affinity(struct irq_data *data,
 	if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
 		return ret;
 
-	entry = data->chip_data;
-	entry->destid_0_7 = cfg->dest_apicid;
-	entry->vector = cfg->vector;
 	send_cleanup_vector(cfg);
 
 	return 0;
@@ -89,20 +85,6 @@ static int hyperv_irq_remapping_alloc(struct irq_domain *domain,
 
 	irq_data->chip = &hyperv_ir_chip;
 
-	/*
-	 * If there is interrupt remapping function of IOMMU, setting irq
-	 * affinity only needs to change IRTE of IOMMU. But Hyper-V doesn't
-	 * support interrupt remapping function, setting irq affinity of IO-APIC
-	 * interrupts still needs to change IO-APIC registers. But ioapic_
-	 * configure_entry() will ignore value of cfg->vector and cfg->
-	 * dest_apicid when IO-APIC's parent irq domain is not the vector
-	 * domain.(See ioapic_configure_entry()) In order to setting vector
-	 * and dest_apicid to IO-APIC register, IO-APIC entry pointer is saved
-	 * in the chip_data and hyperv_irq_remapping_activate()/hyperv_ir_set_
-	 * affinity() set vector and dest_apicid directly into IO-APIC entry.
-	 */
-	irq_data->chip_data = info->ioapic.entry;
-
 	/*
 	 * Hypver-V IO APIC irq affinity should be in the scope of
 	 * ioapic_max_cpumask because no irq remapping support.
@@ -119,22 +101,9 @@ static void hyperv_irq_remapping_free(struct irq_domain *domain,
 	irq_domain_free_irqs_common(domain, virq, nr_irqs);
 }
 
-static int hyperv_irq_remapping_activate(struct irq_domain *domain,
-			  struct irq_data *irq_data, bool reserve)
-{
-	struct irq_cfg *cfg = irqd_cfg(irq_data);
-	struct IO_APIC_route_entry *entry = irq_data->chip_data;
-
-	entry->destid_0_7 = cfg->dest_apicid;
-	entry->vector = cfg->vector;
-
-	return 0;
-}
-
 static const struct irq_domain_ops hyperv_ir_domain_ops = {
 	.alloc = hyperv_irq_remapping_alloc,
 	.free = hyperv_irq_remapping_free,
-	.activate = hyperv_irq_remapping_activate,
 };
 
 static int __init hyperv_prepare_irq_remapping(void)
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 625bdb9f1627..96c84b19940e 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1280,9 +1280,9 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 					     int index, int sub_handle)
 {
 	struct irte *irte = &data->irte_entry;
-	struct IO_APIC_route_entry *entry;
 
 	prepare_irte(irte, irq_cfg->vector, irq_cfg->dest_apicid);
+
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC:
 		/* Set source-id of interrupt request */
@@ -1293,39 +1293,20 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 			irte->trigger_mode, irte->dlvry_mode,
 			irte->avail, irte->vector, irte->dest_id,
 			irte->sid, irte->sq, irte->svt);
-
-		entry = info->ioapic.entry;
-		info->ioapic.entry = NULL;
-		memset(entry, 0, sizeof(*entry));
-		entry->ir_index_15	= !!(index & 0x8000);
-		entry->ir_format	= true;
-		entry->ir_index_0_14	= index & 0x7fff;
-		/*
-		 * IO-APIC RTE will be configured with virtual vector.
-		 * irq handler will do the explicit EOI to the io-apic.
-		 */
-		entry->vector		= info->ioapic.pin;
-		entry->is_level		= info->ioapic.is_level;
-		entry->active_low	= info->ioapic.active_low;
-		/* Mask level triggered irqs. */
-		entry->masked		= info->ioapic.is_level;
+		sub_handle = info->ioapic.pin;
 		break;
-
 	case X86_IRQ_ALLOC_TYPE_HPET:
+		set_hpet_sid(irte, info->devid);
+		break;
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-		if (info->type == X86_IRQ_ALLOC_TYPE_HPET)
-			set_hpet_sid(irte, info->devid);
-		else
-			set_msi_sid(irte, msi_desc_to_pci_dev(info->desc));
-
-		fill_msi_msg(&data->msi_entry, index, sub_handle);
+		set_msi_sid(irte, msi_desc_to_pci_dev(info->desc));
 		break;
-
 	default:
 		BUG_ON(1);
 		break;
 	}
+	fill_msi_msg(&data->msi_entry, index, sub_handle);
 }
 
 static void intel_free_irq_resources(struct irq_domain *domain,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 22/35] genirq/irqdomain: Implement get_name() method on irqchip fwnodes
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (20 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 21/35] x86/ioapic: Generate RTE directly from parent irqchip's MSI message David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-25  9:41                       ` Marc Zyngier
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
  2020-10-24 21:35                     ` [PATCH v3 23/35] x86/apic: Add select() method on vector irqdomain David Woodhouse
                                       ` (13 subsequent siblings)
  35 siblings, 2 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: David Woodhouse <dwmw@amazon.co.uk>

Prerequesite to make x86 more irqdomain compliant.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/irq/irqdomain.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index cf8b374b892d..673fa64c1c44 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -42,7 +42,16 @@ static inline void debugfs_add_domain_dir(struct irq_domain *d) { }
 static inline void debugfs_remove_domain_dir(struct irq_domain *d) { }
 #endif
 
-const struct fwnode_operations irqchip_fwnode_ops;
+static const char *irqchip_fwnode_get_name(const struct fwnode_handle *fwnode)
+{
+	struct irqchip_fwid *fwid = container_of(fwnode, struct irqchip_fwid, fwnode);
+
+	return fwid->name;
+}
+
+const struct fwnode_operations irqchip_fwnode_ops = {
+	.get_name = irqchip_fwnode_get_name,
+};
 EXPORT_SYMBOL_GPL(irqchip_fwnode_ops);
 
 /**
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 23/35] x86/apic: Add select() method on vector irqdomain
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (21 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 22/35] genirq/irqdomain: Implement get_name() method on irqchip fwnodes David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
  2020-10-24 21:35                     ` [PATCH v3 24/35] iommu/amd: Implement select() method on remapping irqdomain David Woodhouse
                                       ` (12 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: David Woodhouse <dwmw@amazon.co.uk>

This will be used to select the irqdomain for I/O-APIC and HPET.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/irqdomain.h |  3 +++
 arch/x86/kernel/apic/vector.c    | 43 ++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/arch/x86/include/asm/irqdomain.h b/arch/x86/include/asm/irqdomain.h
index cd684d45cb5f..125c23b7bad3 100644
--- a/arch/x86/include/asm/irqdomain.h
+++ b/arch/x86/include/asm/irqdomain.h
@@ -12,6 +12,9 @@ enum {
 	X86_IRQ_ALLOC_LEGACY				= 0x2,
 };
 
+extern int x86_fwspec_is_ioapic(struct irq_fwspec *fwspec);
+extern int x86_fwspec_is_hpet(struct irq_fwspec *fwspec);
+
 extern struct irq_domain *x86_vector_domain;
 
 extern void init_irq_alloc_info(struct irq_alloc_info *info,
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index bb2e2a2488a5..b9b05caa28a4 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -636,7 +636,50 @@ static void x86_vector_debug_show(struct seq_file *m, struct irq_domain *d,
 }
 #endif
 
+int x86_fwspec_is_ioapic(struct irq_fwspec *fwspec)
+{
+	if (fwspec->param_count != 1)
+		return 0;
+
+	if (is_fwnode_irqchip(fwspec->fwnode)) {
+		const char *fwname = fwnode_get_name(fwspec->fwnode);
+		return fwname && !strncmp(fwname, "IO-APIC-", 8) &&
+			simple_strtol(fwname+8, NULL, 10) == fwspec->param[0];
+	}
+	return to_of_node(fwspec->fwnode) &&
+		of_device_is_compatible(to_of_node(fwspec->fwnode),
+					"intel,ce4100-ioapic");
+}
+
+int x86_fwspec_is_hpet(struct irq_fwspec *fwspec)
+{
+	if (fwspec->param_count != 1)
+		return 0;
+
+	if (is_fwnode_irqchip(fwspec->fwnode)) {
+		const char *fwname = fwnode_get_name(fwspec->fwnode);
+		return fwname && !strncmp(fwname, "HPET-MSI-", 9) &&
+			simple_strtol(fwname+9, NULL, 10) == fwspec->param[0];
+	}
+	return 0;
+}
+
+static int x86_vector_select(struct irq_domain *d, struct irq_fwspec *fwspec,
+			     enum irq_domain_bus_token bus_token)
+{
+	/*
+	 * HPET and I/OAPIC cannot be parented in the vector domain
+	 * if IRQ remapping is enabled. APIC IDs above 15 bits are
+	 * only permitted if IRQ remapping is enabled, so check that.
+	 */
+	if (apic->apic_id_valid(32768))
+		return 0;
+
+	return x86_fwspec_is_ioapic(fwspec) || x86_fwspec_is_hpet(fwspec);
+}
+
 static const struct irq_domain_ops x86_vector_domain_ops = {
+	.select		= x86_vector_select,
 	.alloc		= x86_vector_alloc_irqs,
 	.free		= x86_vector_free_irqs,
 	.activate	= x86_vector_activate,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 24/35] iommu/amd: Implement select() method on remapping irqdomain
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (22 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 23/35] x86/apic: Add select() method on vector irqdomain David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
  2020-10-24 21:35                     ` [PATCH v3 25/35] iommu/vt-d: " David Woodhouse
                                       ` (11 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: David Woodhouse <dwmw@amazon.co.uk>

Preparatory change to remove irq_remapping_get_irq_domain().

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/iommu/amd/iommu.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 9744cdbefd88..31b22244e9c2 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3882,7 +3882,26 @@ static void irq_remapping_deactivate(struct irq_domain *domain,
 					    irte_info->index);
 }
 
+static int irq_remapping_select(struct irq_domain *d, struct irq_fwspec *fwspec,
+				enum irq_domain_bus_token bus_token)
+{
+	struct amd_iommu *iommu;
+	int devid = -1;
+
+	if (x86_fwspec_is_ioapic(fwspec))
+		devid = get_ioapic_devid(fwspec->param[0]);
+	else if (x86_fwspec_is_hpet(fwspec))
+		devid = get_hpet_devid(fwspec->param[0]);
+
+	if (devid < 0)
+		return 0;
+
+	iommu = amd_iommu_rlookup_table[devid];
+	return iommu && iommu->ir_domain == d;
+}
+
 static const struct irq_domain_ops amd_ir_domain_ops = {
+	.select = irq_remapping_select,
 	.alloc = irq_remapping_alloc,
 	.free = irq_remapping_free,
 	.activate = irq_remapping_activate,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 25/35] iommu/vt-d: Implement select() method on remapping irqdomain
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (23 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 24/35] iommu/amd: Implement select() method on remapping irqdomain David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
  2020-10-24 21:35                     ` [PATCH v3 26/35] iommu/hyper-v: " David Woodhouse
                                       ` (10 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: David Woodhouse <dwmw@amazon.co.uk>

Preparatory for removing irq_remapping_get_irq_domain()

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/iommu/intel/irq_remapping.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 96c84b19940e..b3b079c0b51e 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1431,7 +1431,20 @@ static void intel_irq_remapping_deactivate(struct irq_domain *domain,
 	modify_irte(&data->irq_2_iommu, &entry);
 }
 
+static int intel_irq_remapping_select(struct irq_domain *d,
+				      struct irq_fwspec *fwspec,
+				      enum irq_domain_bus_token bus_token)
+{
+	if (x86_fwspec_is_ioapic(fwspec))
+		return d == map_ioapic_to_ir(fwspec->param[0]);
+	else if (x86_fwspec_is_hpet(fwspec))
+		return d == map_hpet_to_ir(fwspec->param[0]);
+
+	return 0;
+}
+
 static const struct irq_domain_ops intel_ir_domain_ops = {
+	.select = intel_irq_remapping_select,
 	.alloc = intel_irq_remapping_alloc,
 	.free = intel_irq_remapping_free,
 	.activate = intel_irq_remapping_activate,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 26/35] iommu/hyper-v: Implement select() method on remapping irqdomain
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (24 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 25/35] iommu/vt-d: " David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
  2020-10-24 21:35                     ` [PATCH v3 27/35] x86/hpet: Use irq_find_matching_fwspec() to find " David Woodhouse
                                       ` (9 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: David Woodhouse <dwmw@amazon.co.uk>

Preparatory for removing irq_remapping_get_irq_domain()

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/iommu/hyperv-iommu.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 37dd485a5640..78a264ad9405 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -101,7 +101,16 @@ static void hyperv_irq_remapping_free(struct irq_domain *domain,
 	irq_domain_free_irqs_common(domain, virq, nr_irqs);
 }
 
+static int hyperv_irq_remapping_select(struct irq_domain *d,
+				       struct irq_fwspec *fwspec,
+				       enum irq_domain_bus_token bus_token)
+{
+	/* Claim only the first (and only) I/OAPIC */
+	return x86_fwspec_is_ioapic(fwspec) && fwspec->param[0] == 0;
+}
+
 static const struct irq_domain_ops hyperv_ir_domain_ops = {
+	.select = hyperv_irq_remapping_select,
 	.alloc = hyperv_irq_remapping_alloc,
 	.free = hyperv_irq_remapping_free,
 };
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 27/35] x86/hpet: Use irq_find_matching_fwspec() to find remapping irqdomain
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (25 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 26/35] iommu/hyper-v: " David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
  2020-10-24 21:35                     ` [PATCH v3 28/35] x86/ioapic: " David Woodhouse
                                       ` (8 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: David Woodhouse <dwmw@amazon.co.uk>

All possible parent domains have a select method now. Make use of it.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/hpet.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 3b8b12769f3b..08651a4e6aa0 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -543,8 +543,8 @@ static struct irq_domain *hpet_create_irq_domain(int hpet_id)
 {
 	struct msi_domain_info *domain_info;
 	struct irq_domain *parent, *d;
-	struct irq_alloc_info info;
 	struct fwnode_handle *fn;
+	struct irq_fwspec fwspec;
 
 	if (x86_vector_domain == NULL)
 		return NULL;
@@ -556,15 +556,6 @@ static struct irq_domain *hpet_create_irq_domain(int hpet_id)
 	*domain_info = hpet_msi_domain_info;
 	domain_info->data = (void *)(long)hpet_id;
 
-	init_irq_alloc_info(&info, NULL);
-	info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
-	info.devid = hpet_id;
-	parent = irq_remapping_get_irq_domain(&info);
-	if (parent == NULL)
-		parent = x86_vector_domain;
-	else
-		hpet_msi_controller.name = "IR-HPET-MSI";
-
 	fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
 					      hpet_id);
 	if (!fn) {
@@ -572,6 +563,19 @@ static struct irq_domain *hpet_create_irq_domain(int hpet_id)
 		return NULL;
 	}
 
+	fwspec.fwnode = fn;
+	fwspec.param_count = 1;
+	fwspec.param[0] = hpet_id;
+
+	parent = irq_find_matching_fwspec(&fwspec, DOMAIN_BUS_ANY);
+	if (!parent) {
+		irq_domain_free_fwnode(fn);
+		kfree(domain_info);
+		return NULL;
+	}
+	if (parent != x86_vector_domain)
+		hpet_msi_controller.name = "IR-HPET-MSI";
+
 	d = msi_create_irq_domain(fn, domain_info, parent);
 	if (!d) {
 		irq_domain_free_fwnode(fn);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 28/35] x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (26 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 27/35] x86/hpet: Use irq_find_matching_fwspec() to find " David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
  2020-10-24 21:35                     ` [PATCH v3 29/35] x86: Kill all traces of irq_remapping_get_irq_domain() David Woodhouse
                                       ` (7 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: David Woodhouse <dwmw@amazon.co.uk>

All possible parent domains have a select method now. Make use of it.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/apic/io_apic.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index ea983da1a57f..443d2c9086b9 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2320,36 +2320,37 @@ static inline void __init check_timer(void)
 
 static int mp_irqdomain_create(int ioapic)
 {
-	struct irq_alloc_info info;
 	struct irq_domain *parent;
 	int hwirqs = mp_ioapic_pin_count(ioapic);
 	struct ioapic *ip = &ioapics[ioapic];
 	struct ioapic_domain_cfg *cfg = &ip->irqdomain_cfg;
 	struct mp_ioapic_gsi *gsi_cfg = mp_ioapic_gsi_routing(ioapic);
 	struct fwnode_handle *fn;
-	char *name = "IO-APIC";
+	struct irq_fwspec fwspec;
 
 	if (cfg->type == IOAPIC_DOMAIN_INVALID)
 		return 0;
 
-	init_irq_alloc_info(&info, NULL);
-	info.type = X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT;
-	info.devid = mpc_ioapic_id(ioapic);
-	parent = irq_remapping_get_irq_domain(&info);
-	if (!parent)
-		parent = x86_vector_domain;
-	else
-		name = "IO-APIC-IR";
-
 	/* Handle device tree enumerated APICs proper */
 	if (cfg->dev) {
 		fn = of_node_to_fwnode(cfg->dev);
 	} else {
-		fn = irq_domain_alloc_named_id_fwnode(name, ioapic);
+		fn = irq_domain_alloc_named_id_fwnode("IO-APIC", ioapic);
 		if (!fn)
 			return -ENOMEM;
 	}
 
+	fwspec.fwnode = fn;
+	fwspec.param_count = 1;
+	fwspec.param[0] = ioapic;
+
+	parent = irq_find_matching_fwspec(&fwspec, DOMAIN_BUS_ANY);
+	if (!parent) {
+		if (!cfg->dev)
+			irq_domain_free_fwnode(fn);
+		return -ENODEV;
+	}
+
 	ip->irqdomain = irq_domain_create_linear(fn, hwirqs, cfg->ops,
 						 (void *)(long)ioapic);
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 29/35] x86: Kill all traces of irq_remapping_get_irq_domain()
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (27 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 28/35] x86/ioapic: " David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
  2020-10-24 21:35                     ` [PATCH v3 30/35] iommu/vt-d: Simplify intel_irq_remapping_select() David Woodhouse
                                       ` (6 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: David Woodhouse <dwmw@amazon.co.uk>

All users are converted to use the fwspec based parent domain lookup.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/hw_irq.h        |  2 --
 arch/x86/include/asm/irq_remapping.h |  9 --------
 drivers/iommu/amd/iommu.c            | 34 ----------------------------
 drivers/iommu/hyperv-iommu.c         |  9 --------
 drivers/iommu/intel/irq_remapping.c  | 17 --------------
 drivers/iommu/irq_remapping.c        | 14 ------------
 drivers/iommu/irq_remapping.h        |  3 ---
 7 files changed, 88 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 83a69f62637e..458f5a676402 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -40,8 +40,6 @@ enum irq_alloc_type {
 	X86_IRQ_ALLOC_TYPE_PCI_MSIX,
 	X86_IRQ_ALLOC_TYPE_DMAR,
 	X86_IRQ_ALLOC_TYPE_UV,
-	X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT,
-	X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT,
 };
 
 struct ioapic_alloc_info {
diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index af4a151d70b3..7cc49432187f 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -44,9 +44,6 @@ extern int irq_remapping_reenable(int);
 extern int irq_remap_enable_fault_handling(void);
 extern void panic_if_irq_remap(const char *msg);
 
-extern struct irq_domain *
-irq_remapping_get_irq_domain(struct irq_alloc_info *info);
-
 /* Create PCI MSI/MSIx irqdomain, use @parent as the parent irqdomain. */
 extern struct irq_domain *
 arch_create_remap_msi_irq_domain(struct irq_domain *par, const char *n, int id);
@@ -71,11 +68,5 @@ static inline void panic_if_irq_remap(const char *msg)
 {
 }
 
-static inline struct irq_domain *
-irq_remapping_get_irq_domain(struct irq_alloc_info *info)
-{
-	return NULL;
-}
-
 #endif /* CONFIG_IRQ_REMAP */
 #endif /* __X86_IRQ_REMAPPING_H */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 31b22244e9c2..463d322a4f3b 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3601,10 +3601,8 @@ static int get_devid(struct irq_alloc_info *info)
 {
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC:
-	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
 		return get_ioapic_devid(info->devid);
 	case X86_IRQ_ALLOC_TYPE_HPET:
-	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
 		return get_hpet_devid(info->devid);
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
@@ -3615,44 +3613,12 @@ static int get_devid(struct irq_alloc_info *info)
 	}
 }
 
-static struct irq_domain *get_irq_domain_for_devid(struct irq_alloc_info *info,
-						   int devid)
-{
-	struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
-
-	if (!iommu)
-		return NULL;
-
-	switch (info->type) {
-	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
-	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
-		return iommu->ir_domain;
-	default:
-		WARN_ON_ONCE(1);
-		return NULL;
-	}
-}
-
-static struct irq_domain *get_irq_domain(struct irq_alloc_info *info)
-{
-	int devid;
-
-	if (!info)
-		return NULL;
-
-	devid = get_devid(info);
-	if (devid < 0)
-		return NULL;
-	return get_irq_domain_for_devid(info, devid);
-}
-
 struct irq_remap_ops amd_iommu_irq_ops = {
 	.prepare		= amd_iommu_prepare,
 	.enable			= amd_iommu_enable,
 	.disable		= amd_iommu_disable,
 	.reenable		= amd_iommu_reenable,
 	.enable_faulting	= amd_iommu_enable_faulting,
-	.get_irq_domain		= get_irq_domain,
 };
 
 static void fill_msi_msg(struct msi_msg *msg, u32 index)
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 78a264ad9405..a629a6be65c7 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -160,18 +160,9 @@ static int __init hyperv_enable_irq_remapping(void)
 	return IRQ_REMAP_X2APIC_MODE;
 }
 
-static struct irq_domain *hyperv_get_irq_domain(struct irq_alloc_info *info)
-{
-	if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT)
-		return ioapic_ir_domain;
-	else
-		return NULL;
-}
-
 struct irq_remap_ops hyperv_irq_remap_ops = {
 	.prepare		= hyperv_prepare_irq_remapping,
 	.enable			= hyperv_enable_irq_remapping,
-	.get_irq_domain		= hyperv_get_irq_domain,
 };
 
 #endif
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index b3b079c0b51e..bca44015bc1d 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1127,29 +1127,12 @@ static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
 	irte->redir_hint = 1;
 }
 
-static struct irq_domain *intel_get_irq_domain(struct irq_alloc_info *info)
-{
-	if (!info)
-		return NULL;
-
-	switch (info->type) {
-	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
-		return map_ioapic_to_ir(info->devid);
-	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
-		return map_hpet_to_ir(info->devid);
-	default:
-		WARN_ON_ONCE(1);
-		return NULL;
-	}
-}
-
 struct irq_remap_ops intel_irq_remap_ops = {
 	.prepare		= intel_prepare_irq_remapping,
 	.enable			= intel_enable_irq_remapping,
 	.disable		= disable_irq_remapping,
 	.reenable		= reenable_irq_remapping,
 	.enable_faulting	= enable_drhd_fault_handling,
-	.get_irq_domain		= intel_get_irq_domain,
 };
 
 static void intel_ir_reconfigure_irte(struct irq_data *irqd, bool force)
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index 2d84b1ed205e..83314b9d8f38 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -158,17 +158,3 @@ void panic_if_irq_remap(const char *msg)
 	if (irq_remapping_enabled)
 		panic(msg);
 }
-
-/**
- * irq_remapping_get_irq_domain - Get the irqdomain serving the request @info
- * @info: interrupt allocation information, used to identify the IOMMU device
- *
- * Returns pointer to IRQ domain, or NULL on failure.
- */
-struct irq_domain *irq_remapping_get_irq_domain(struct irq_alloc_info *info)
-{
-	if (!remap_ops || !remap_ops->get_irq_domain)
-		return NULL;
-
-	return remap_ops->get_irq_domain(info);
-}
diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
index 1661b3d75920..8c89cb947cdb 100644
--- a/drivers/iommu/irq_remapping.h
+++ b/drivers/iommu/irq_remapping.h
@@ -42,9 +42,6 @@ struct irq_remap_ops {
 
 	/* Enable fault handling */
 	int  (*enable_faulting)(void);
-
-	/* Get the irqdomain associated to IOMMU device */
-	struct irq_domain *(*get_irq_domain)(struct irq_alloc_info *);
 };
 
 extern struct irq_remap_ops intel_irq_remap_ops;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 30/35] iommu/vt-d: Simplify intel_irq_remapping_select()
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (28 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 29/35] x86: Kill all traces of irq_remapping_get_irq_domain() David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
  2020-10-24 21:35                     ` [PATCH v3 31/35] x86/ioapic: Handle Extended Destination ID field in RTE David Woodhouse
                                       ` (5 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: David Woodhouse <dwmw@amazon.co.uk>

Now that the old get_irq_domain() method has gone, consolidate on just the
map_XXX_to_iommu() functions.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/iommu/intel/irq_remapping.c | 19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index bca44015bc1d..aeffda92b10b 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -203,13 +203,13 @@ static int modify_irte(struct irq_2_iommu *irq_iommu,
 	return rc;
 }
 
-static struct irq_domain *map_hpet_to_ir(u8 hpet_id)
+static struct intel_iommu *map_hpet_to_iommu(u8 hpet_id)
 {
 	int i;
 
 	for (i = 0; i < MAX_HPET_TBS; i++) {
 		if (ir_hpet[i].id == hpet_id && ir_hpet[i].iommu)
-			return ir_hpet[i].iommu->ir_domain;
+			return ir_hpet[i].iommu;
 	}
 	return NULL;
 }
@@ -225,13 +225,6 @@ static struct intel_iommu *map_ioapic_to_iommu(int apic)
 	return NULL;
 }
 
-static struct irq_domain *map_ioapic_to_ir(int apic)
-{
-	struct intel_iommu *iommu = map_ioapic_to_iommu(apic);
-
-	return iommu ? iommu->ir_domain : NULL;
-}
-
 static struct irq_domain *map_dev_to_ir(struct pci_dev *dev)
 {
 	struct dmar_drhd_unit *drhd = dmar_find_matched_drhd_unit(dev);
@@ -1418,12 +1411,14 @@ static int intel_irq_remapping_select(struct irq_domain *d,
 				      struct irq_fwspec *fwspec,
 				      enum irq_domain_bus_token bus_token)
 {
+	struct intel_iommu *iommu = NULL;
+
 	if (x86_fwspec_is_ioapic(fwspec))
-		return d == map_ioapic_to_ir(fwspec->param[0]);
+		iommu = map_ioapic_to_iommu(fwspec->param[0]);
 	else if (x86_fwspec_is_hpet(fwspec))
-		return d == map_hpet_to_ir(fwspec->param[0]);
+		iommu = map_hpet_to_iommu(fwspec->param[0]);
 
-	return 0;
+	return iommu && d == iommu->ir_domain;
 }
 
 static const struct irq_domain_ops intel_ir_domain_ops = {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 31/35] x86/ioapic: Handle Extended Destination ID field in RTE
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (29 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 30/35] iommu/vt-d: Simplify intel_irq_remapping_select() David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
  2020-10-24 21:35                     ` [PATCH v3 32/35] x86/apic: Support 15 bits of APIC ID in MSI where available David Woodhouse
                                       ` (4 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: David Woodhouse <dwmw@amazon.co.uk>

Bits 63-48 of the I/OAPIC Redirection Table Entry map directly to bits 19-4
of the address used in the resulting MSI cycle.

Historically, the x86 MSI format only used the top 8 of those 16 bits as
the destination APIC ID, and the "Extended Destination ID" in the lower 8
bits was unused.

With interrupt remapping, the lowest bit of the Extended Destination ID
(bit 48 of RTE, bit 4 of MSI address) is now used to indicate a remappable
format MSI.

A hypervisor can use the other 7 bits of the Extended Destination ID to
permit guests to address up to 15 bits of APIC IDs, thus allowing 32768
vCPUs before having to expose a vIOMMU and interrupt remapping to the
guest.

No behavioural change in this patch, since nothing yet permits APIC IDs
above 255 to be used with the non-IR I/OAPIC domain.

[ tglx: Converted it to the cleaned up entry/msi_msg format and added
  	commentry ]

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/io_apic.h |  3 ++-
 arch/x86/kernel/apic/io_apic.c | 20 +++++++++++++++-----
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 73da644b2f0d..437aa8d00e53 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -67,7 +67,8 @@ struct IO_APIC_route_entry {
 				is_level		:  1,
 				masked			:  1,
 				reserved_0		: 15,
-				reserved_1		: 24,
+				reserved_1		: 17,
+				virt_destid_8_14	:  7,
 				destid_0_7		:  8;
 		};
 		struct {
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 443d2c9086b9..1cfd65ef295b 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1238,9 +1238,10 @@ static void io_apic_print_entries(unsigned int apic, unsigned int nr_entries)
 			       (entry.ir_index_15 << 15) | entry.ir_index_0_14,
 				entry.ir_zero);
 		} else {
-			printk(KERN_DEBUG "%s, %s, D(%02X), M(%1d)\n", buf,
+			printk(KERN_DEBUG "%s, %s, D(%02X%02X), M(%1d)\n", buf,
 			       entry.dest_mode_logical ? "logical " : "physical",
-			       entry.destid_0_7, entry.delivery_mode);
+			       entry.virt_destid_8_14, entry.destid_0_7,
+			       entry.delivery_mode);
 		}
 	}
 }
@@ -1409,6 +1410,7 @@ void native_restore_boot_irq_mode(void)
 	 */
 	if (ioapic_i8259.pin != -1) {
 		struct IO_APIC_route_entry entry;
+		u32 apic_id = read_apic_id();
 
 		memset(&entry, 0, sizeof(entry));
 		entry.masked		= false;
@@ -1416,7 +1418,8 @@ void native_restore_boot_irq_mode(void)
 		entry.active_low	= false;
 		entry.dest_mode_logical	= false;
 		entry.delivery_mode	= APIC_DELIVERY_MODE_EXTINT;
-		entry.destid_0_7	= read_apic_id();
+		entry.destid_0_7	= apic_id & 0xFF;
+		entry.virt_destid_8_14	= apic_id >> 8;
 
 		/*
 		 * Add it to the IO-APIC irq-routing table:
@@ -1885,7 +1888,11 @@ static void ioapic_setup_msg_from_msi(struct irq_data *irq_data,
 	/* DMAR/IR: 1, 0 for all other modes */
 	entry->ir_format		= msg.arch_addr_lo.dmar_format;
 	/*
-	 * DMAR/IR: index bit 0-14.
+	 * - DMAR/IR: index bit 0-14.
+	 *
+	 * - Virt: If the host supports x2apic without a virtualized IR
+	 *	   unit then bit 0-6 of dmar_index_0_14 are providing bit
+	 *	   8-14 of the destination id.
 	 *
 	 * All other modes have bit 0-6 of dmar_index_0_14 cleared and the
 	 * topmost 8 bits are destination id bit 0-7 (entry::destid_0_7).
@@ -2063,6 +2070,7 @@ static inline void __init unlock_ExtINT_logic(void)
 	int apic, pin, i;
 	struct IO_APIC_route_entry entry0, entry1;
 	unsigned char save_control, save_freq_select;
+	u32 apic_id;
 
 	pin  = find_isa_irq_pin(8, mp_INT);
 	if (pin == -1) {
@@ -2078,11 +2086,13 @@ static inline void __init unlock_ExtINT_logic(void)
 	entry0 = ioapic_read_entry(apic, pin);
 	clear_IO_APIC_pin(apic, pin);
 
+	apic_id = hard_smp_processor_id();
 	memset(&entry1, 0, sizeof(entry1));
 
 	entry1.dest_mode_logical	= true;
 	entry1.masked			= false;
-	entry1.destid_0_7		= hard_smp_processor_id();
+	entry1.destid_0_7		= apic_id & 0xFF;
+	entry1.virt_destid_8_14		= apic_id >> 8;
 	entry1.delivery_mode		= APIC_DELIVERY_MODE_EXTINT;
 	entry1.active_low		= entry0.active_low;
 	entry1.is_level			= false;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 32/35] x86/apic: Support 15 bits of APIC ID in MSI where available
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (30 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 31/35] x86/ioapic: Handle Extended Destination ID field in RTE David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
  2020-10-24 21:35                     ` [PATCH v3 33/35] iommu/hyper-v: Disable IRQ pseudo-remapping if 15 bit APIC IDs are available David Woodhouse
                                       ` (3 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: David Woodhouse <dwmw@amazon.co.uk>

Some hypervisors can allow the guest to use the Extended Destination ID
field in the MSI address to address up to 32768 CPUs.

This applies to all downstream devices which generate MSI cycles,
including HPET, I/OAPIC and PCI MSI.

HPET and PCI MSI use the same __irq_msi_compose_msg() function, while
I/OAPIC generates its own and had support for the extended bits added in
a previous commit.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201009104616.1314746-6-dwmw2@infradead.org
---
 arch/x86/include/asm/msi.h      |  3 ++-
 arch/x86/include/asm/x86_init.h |  2 ++
 arch/x86/kernel/apic/apic.c     | 26 ++++++++++++++++++++++++--
 arch/x86/kernel/x86_init.c      |  1 +
 4 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/msi.h b/arch/x86/include/asm/msi.h
index 322fd905da9c..b85147d75626 100644
--- a/arch/x86/include/asm/msi.h
+++ b/arch/x86/include/asm/msi.h
@@ -29,7 +29,8 @@ typedef struct x86_msi_addr_lo {
 			u32	reserved_0		:  2,
 				dest_mode_logical	:  1,
 				redirect_hint		:  1,
-				reserved_1		:  8,
+				reserved_1		:  1,
+				virt_destid_8_14	:  7,
 				destid_0_7		:  8,
 				base_address		: 12;
 		};
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index dde5b3f1e7cd..5c69f7eb5d47 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -116,6 +116,7 @@ struct x86_init_pci {
  * @init_platform:		platform setup
  * @guest_late_init:		guest late init
  * @x2apic_available:		X2APIC detection
+ * @msi_ext_dest_id:		MSI supports 15-bit APIC IDs
  * @init_mem_mapping:		setup early mappings during init_mem_mapping()
  * @init_after_bootmem:		guest init after boot allocator is finished
  */
@@ -123,6 +124,7 @@ struct x86_hyper_init {
 	void (*init_platform)(void);
 	void (*guest_late_init)(void);
 	bool (*x2apic_available)(void);
+	bool (*msi_ext_dest_id)(void);
 	void (*init_mem_mapping)(void);
 	void (*init_after_bootmem)(void);
 };
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index f7196ee0f005..6bd20c0de8bc 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -93,6 +93,11 @@ static unsigned int disabled_cpu_apicid __ro_after_init = BAD_APICID;
  */
 static int apic_extnmi __ro_after_init = APIC_EXTNMI_BSP;
 
+/*
+ * Hypervisor supports 15 bits of APIC ID in MSI Extended Destination ID
+ */
+static bool virt_ext_dest_id __ro_after_init;
+
 /*
  * Map cpu index to physical APIC ID
  */
@@ -1841,6 +1846,8 @@ static __init void try_to_enable_x2apic(int remap_mode)
 		return;
 
 	if (remap_mode != IRQ_REMAP_X2APIC_MODE) {
+		u32 apic_limit = 255;
+
 		/*
 		 * Using X2APIC without IR is not architecturally supported
 		 * on bare metal but may be supported in guests.
@@ -1851,12 +1858,22 @@ static __init void try_to_enable_x2apic(int remap_mode)
 			return;
 		}
 
+		/*
+		 * If the hypervisor supports extended destination ID in
+		 * MSI, that increases the maximum APIC ID that can be
+		 * used for non-remapped IRQ domains.
+		 */
+		if (x86_init.hyper.msi_ext_dest_id()) {
+			virt_ext_dest_id = 1;
+			apic_limit = 32767;
+		}
+
 		/*
 		 * Without IR, all CPUs can be addressed by IOAPIC/MSI only
 		 * in physical mode, and CPUs with an APIC ID that cannnot
 		 * be addressed must not be brought online.
 		 */
-		x2apic_set_max_apicid(255);
+		x2apic_set_max_apicid(apic_limit);
 		x2apic_phys = 1;
 	}
 	x2apic_enable();
@@ -2497,10 +2514,15 @@ void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
 	 * Only the IOMMU itself can use the trick of putting destination
 	 * APIC ID into the high bits of the address. Anything else would
 	 * just be writing to memory if it tried that, and needs IR to
-	 * address higher APIC IDs.
+	 * address APICs which can't be addressed in the normal 32-bit
+	 * address range at 0xFFExxxxx. That is typically just 8 bits, but
+	 * some hypervisors allow the extended destination ID field in bits
+	 * 5-11 to be used, giving support for 15 bits of APIC IDs in total.
 	 */
 	if (dmar)
 		msg->arch_addr_hi.destid_8_31 = cfg->dest_apicid >> 8;
+	else if (virt_ext_dest_id && cfg->dest_apicid < 0x8000)
+		msg->arch_addr_lo.virt_destid_8_14 = cfg->dest_apicid >> 8;
 	else
 		WARN_ON_ONCE(cfg->dest_apicid > 0xFF);
 }
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index a3038d8deb6a..8b395821cb8d 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -110,6 +110,7 @@ struct x86_init_ops x86_init __initdata = {
 		.init_platform		= x86_init_noop,
 		.guest_late_init	= x86_init_noop,
 		.x2apic_available	= bool_x86_init_noop,
+		.msi_ext_dest_id	= bool_x86_init_noop,
 		.init_mem_mapping	= x86_init_noop,
 		.init_after_bootmem	= x86_init_noop,
 	},
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 33/35] iommu/hyper-v: Disable IRQ pseudo-remapping if 15 bit APIC IDs are available
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (31 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 32/35] x86/apic: Support 15 bits of APIC ID in MSI where available David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
  2020-10-24 21:35                     ` [PATCH v3 34/35] x86/kvm: Reserve KVM_FEATURE_MSI_EXT_DEST_ID David Woodhouse
                                       ` (2 subsequent siblings)
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: David Woodhouse <dwmw@amazon.co.uk>

If the 15-bit APIC ID support is present in emulated MSI then there's no
need for the pseudo-remapping support.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 drivers/iommu/hyperv-iommu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index a629a6be65c7..9438daa24fdb 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -121,6 +121,7 @@ static int __init hyperv_prepare_irq_remapping(void)
 	int i;
 
 	if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
+	    x86_init.hyper.msi_ext_dest_id() ||
 	    !x2apic_supported())
 		return -ENODEV;
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 34/35] x86/kvm: Reserve KVM_FEATURE_MSI_EXT_DEST_ID
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (32 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 33/35] iommu/hyper-v: Disable IRQ pseudo-remapping if 15 bit APIC IDs are available David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-24 21:35                     ` [PATCH v3 35/35] x86/kvm: Enable 15-bit extension when KVM_FEATURE_MSI_EXT_DEST_ID detected David Woodhouse
  2020-10-25  8:12                     ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
  35 siblings, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: David Woodhouse <dwmw@amazon.co.uk>

No functional change; just reserve the feature bit for now so that VMMs
can start to implement it.

This will allow the host to indicate that MSI emulation supports 15-bit
destination IDs, allowing up to 32768 CPUs without interrupt remapping.

cf. https://patchwork.kernel.org/patch/11816693/ for qemu

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/virt/kvm/cpuid.rst     | 4 ++++
 arch/x86/include/uapi/asm/kvm_para.h | 1 +
 2 files changed, 5 insertions(+)

diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index 9150e9d1c39b..f70b655821d5 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -92,6 +92,10 @@ KVM_FEATURE_ASYNC_PF_INT          14          guest checks this feature bit
                                               async pf acknowledgment msr
                                               0x4b564d07.
 
+KVM_FEATURE_MSI_EXT_DEST_ID       15          guest checks this feature bit
+                                              before using extended destination
+                                              ID bits in MSI address bits 11-5.
+
 KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24          host will warn if no guest-side
                                               per-cpu warps are expeced in
                                               kvmclock
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 812e9b4c1114..950afebfba88 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -32,6 +32,7 @@
 #define KVM_FEATURE_POLL_CONTROL	12
 #define KVM_FEATURE_PV_SCHED_YIELD	13
 #define KVM_FEATURE_ASYNC_PF_INT	14
+#define KVM_FEATURE_MSI_EXT_DEST_ID	15
 
 #define KVM_HINTS_REALTIME      0
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 35/35] x86/kvm: Enable 15-bit extension when KVM_FEATURE_MSI_EXT_DEST_ID detected
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (33 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 34/35] x86/kvm: Reserve KVM_FEATURE_MSI_EXT_DEST_ID David Woodhouse
@ 2020-10-24 21:35                     ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
  2020-10-25  8:12                     ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
  35 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-24 21:35 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: David Woodhouse <dwmw@amazon.co.uk>

This allows the host to indicate that MSI emulation supports 15-bit
destination IDs, allowing up to 32768 CPUs without interrupt remapping.

cf. https://patchwork.kernel.org/patch/11816693/ for qemu

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kernel/kvm.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 1c0f2560a41c..b82de2843814 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -740,6 +740,11 @@ static void __init kvm_apic_init(void)
 #endif
 }
 
+static bool __init kvm_msi_ext_dest_id(void)
+{
+	return kvm_para_has_feature(KVM_FEATURE_MSI_EXT_DEST_ID);
+}
+
 static void __init kvm_init_platform(void)
 {
 	kvmclock_init();
@@ -769,6 +774,7 @@ const __initconst struct hypervisor_x86 x86_hyper_kvm = {
 	.type				= X86_HYPER_KVM,
 	.init.guest_late_init		= kvm_guest_init,
 	.init.x2apic_available		= kvm_para_available,
+	.init.msi_ext_dest_id		= kvm_msi_ext_dest_id,
 	.init.init_platform		= kvm_init_platform,
 #if defined(CONFIG_AMD_MEM_ENCRYPT)
 	.runtime.sev_es_hcall_prepare	= kvm_sev_es_hcall_prepare,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields
  2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
                                       ` (34 preceding siblings ...)
  2020-10-24 21:35                     ` [PATCH v3 35/35] x86/kvm: Enable 15-bit extension when KVM_FEATURE_MSI_EXT_DEST_ID detected David Woodhouse
@ 2020-10-25  8:12                     ` David Woodhouse
  35 siblings, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-10-25  8:12 UTC (permalink / raw)
  To: x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

[-- Attachment #1: Type: text/plain, Size: 1410 bytes --]

On Sat, 2020-10-24 at 22:35 +0100, David Woodhouse wrote:
> Fix the conditions for enabling x2apic on guests without interrupt 
> remapping, and support 15-bit Extended Destination ID to allow 32768 
> CPUs without IR on hypervisors that support it.
> 
> Make the I/OAPIC code generate its RTE directly from the MSI message
> created by the parent irqchip, and fix up a bunch of magic mask/shift
> macros to use bitfields for MSI messages and I/OAPIC RTEs while we're
> at it.

Forgot to mention (since I thought I'd posted it in a previous series)
that v3 also ditches irq_remapping_get_irq_domain() and some icky
special cases of hard-coding "x86_vector_domain", and makes HPET and
I/OAPIC use irq_find_matching_fwspeC() to find their parent irqdomain.

> v3:
>  • Lots of bitfield cleanups from Thomas.
>  • Disable hyperv-iommu if 15-bit extension is present.
>  • Fix inconsistent CONFIG_PCI_MSI/CONFIG_GENERIC_MSI_IRQ in hpet.c
>  • Split KVM_FEATURE_MSI_EXT_DEST_ID patch, half of which is going upstream
>    through KVM tree (and the other half needs to wait, or have an #ifdef) so
>    is left at the top of the tree.
> 
> v2:
>  • Minor cleanups.
>  • Move __irq_msi_compose_msg() to apic.c, make virt_ext_dest_id static.
>  • Generate I/OAPIC RTE directly from parent irqchip's MSI messages.
>  • Clean up HPET MSI support into hpet.c now that we can.


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 22/35] genirq/irqdomain: Implement get_name() method on irqchip fwnodes
  2020-10-24 21:35                     ` [PATCH v3 22/35] genirq/irqdomain: Implement get_name() method on irqchip fwnodes David Woodhouse
@ 2020-10-25  9:41                       ` Marc Zyngier
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
  1 sibling, 0 replies; 192+ messages in thread
From: Marc Zyngier @ 2020-10-25  9:41 UTC (permalink / raw)
  To: David Woodhouse
  Cc: x86, kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini,
	linux-kernel, linux-hyperv, Dexuan Cui

Hi David,

nit: please use my kernel.org address for kernel related stuff.

On Sat, 24 Oct 2020 22:35:22 +0100,
David Woodhouse <dwmw2@infradead.org> wrote:
> 
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> Prerequesite to make x86 more irqdomain compliant.

Prerequisite?

> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  kernel/irq/irqdomain.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
> index cf8b374b892d..673fa64c1c44 100644
> --- a/kernel/irq/irqdomain.c
> +++ b/kernel/irq/irqdomain.c
> @@ -42,7 +42,16 @@ static inline void debugfs_add_domain_dir(struct irq_domain *d) { }
>  static inline void debugfs_remove_domain_dir(struct irq_domain *d) { }
>  #endif
>  
> -const struct fwnode_operations irqchip_fwnode_ops;
> +static const char *irqchip_fwnode_get_name(const struct fwnode_handle *fwnode)
> +{
> +	struct irqchip_fwid *fwid = container_of(fwnode, struct irqchip_fwid, fwnode);
> +
> +	return fwid->name;
> +}
> +
> +const struct fwnode_operations irqchip_fwnode_ops = {
> +	.get_name = irqchip_fwnode_get_name,
> +};
>  EXPORT_SYMBOL_GPL(irqchip_fwnode_ops);
>  
>  /**

Acked-by: Marc Zyngier <maz@kernel.org>

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v3 17/35] x86/pci/xen: Use msi_msg shadow structs
  2020-10-24 21:35                     ` [PATCH v3 17/35] x86/pci/xen: " David Woodhouse
@ 2020-10-25  9:49                       ` David Laight
  2020-10-25 10:26                         ` David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
  1 sibling, 1 reply; 192+ messages in thread
From: David Laight @ 2020-10-25  9:49 UTC (permalink / raw)
  To: 'David Woodhouse', x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: David Woodhouse
> Sent: 24 October 2020 22:35
> 
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Use the msi_msg shadow structs and compose the message with named bitfields
> instead of the unreadable macro maze.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  arch/x86/pci/xen.c | 26 +++++++++++---------------
>  1 file changed, 11 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
> index c552cd2d0632..3d41a09c2c14 100644
> --- a/arch/x86/pci/xen.c
> +++ b/arch/x86/pci/xen.c
> @@ -152,7 +152,6 @@ static int acpi_register_gsi_xen(struct device *dev, u32 gsi,
> 
>  #if defined(CONFIG_PCI_MSI)
>  #include <linux/msi.h>
> -#include <asm/msidef.h>
> 
>  struct xen_pci_frontend_ops *xen_pci_frontend;
>  EXPORT_SYMBOL_GPL(xen_pci_frontend);
> @@ -210,23 +209,20 @@ static int xen_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
>  	return ret;
>  }
> 
> -#define XEN_PIRQ_MSI_DATA  (MSI_DATA_TRIGGER_EDGE | \
> -		MSI_DATA_LEVEL_ASSERT | (3 << 8) | MSI_DATA_VECTOR(0))
> -
>  static void xen_msi_compose_msg(struct pci_dev *pdev, unsigned int pirq,
>  		struct msi_msg *msg)
>  {
> -	/* We set vector == 0 to tell the hypervisor we don't care about it,
> -	 * but we want a pirq setup instead.
> -	 * We use the dest_id field to pass the pirq that we want. */
> -	msg->address_hi = MSI_ADDR_BASE_HI | MSI_ADDR_EXT_DEST_ID(pirq);
> -	msg->address_lo =
> -		MSI_ADDR_BASE_LO |
> -		MSI_ADDR_DEST_MODE_PHYSICAL |
> -		MSI_ADDR_REDIRECTION_CPU |
> -		MSI_ADDR_DEST_ID(pirq);
> -
> -	msg->data = XEN_PIRQ_MSI_DATA;
> +	/*
> +	 * We set vector == 0 to tell the hypervisor we don't care about
> +	 * it, but we want a pirq setup instead.  We use the dest_id fields
> +	 * to pass the pirq that we want.
> +	 */
> +	memset(msg, 0, sizeof(*msg));
> +	msg->address_hi = X86_MSI_BASE_ADDRESS_HIGH;
> +	msg->arch_addr_hi.destid_8_31 = pirq >> 8;
> +	msg->arch_addr_lo.destid_0_7 = pirq & 0xFF;
> +	msg->arch_addr_lo.base_address = X86_MSI_BASE_ADDRESS_LOW;
> +	msg->arch_data.delivery_mode = APIC_DELIVERY_MODE_EXTINT;
>  }

Just looking at a random one of these patches...

Does the compiler manage to optimise that reasonably?
Or does it generate a lot of shifts and masks as each
bitfield is set?

The code generation for bitfields is often a lot worse
that that for |= setting bits in a word.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 17/35] x86/pci/xen: Use msi_msg shadow structs
  2020-10-25  9:49                       ` David Laight
@ 2020-10-25 10:26                         ` David Woodhouse
  2020-10-25 13:20                           ` David Laight
  0 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-10-25 10:26 UTC (permalink / raw)
  To: David Laight, x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

[-- Attachment #1: Type: text/plain, Size: 4547 bytes --]

On Sun, 2020-10-25 at 09:49 +0000, David Laight wrote:
> Just looking at a random one of these patches...
> 
> Does the compiler manage to optimise that reasonably?
> Or does it generate a lot of shifts and masks as each
> bitfield is set?
> 
> The code generation for bitfields is often a lot worse
> that that for |= setting bits in a word.

Indeed, it appears to be utterly appalling. That was one of my
motivations for doing it with masks and shifts in the first place.

Because in ioapic_setup_msg_from_msi(), for example, these two are
consecutive in both source and destination:

	entry->vector			= msg.arch_data.vector;
	entry->delivery_mode		= msg.arch_data.delivery_mode;

And so are those:

	entry->ir_format		= msg.arch_addr_lo.dmar_format;
	entry->ir_index_0_14		= msg.arch_addr_lo.dmar_index_0_14;

But GCC 7.5.0 here is doing them all separately...

00000000000011e0 <ioapic_setup_msg_from_msi>:
{
    11e0:       53                      push   %rbx
    11e1:       48 89 f3                mov    %rsi,%rbx
    11e4:       48 83 ec 18             sub    $0x18,%rsp
        irq_chip_compose_msi_msg(irq_data, &msg);
    11e8:       48 89 e6                mov    %rsp,%rsi
{
    11eb:       65 48 8b 04 25 28 00    mov    %gs:0x28,%rax
    11f2:       00 00 
    11f4:       48 89 44 24 10          mov    %rax,0x10(%rsp)
    11f9:       31 c0                   xor    %eax,%eax
        irq_chip_compose_msi_msg(irq_data, &msg);
    11fb:       e8 00 00 00 00          callq  1200 <ioapic_setup_msg_from_msi+0x20>
        entry->vector                   = msg.arch_data.vector;
    1200:       0f b6 44 24 08          movzbl 0x8(%rsp),%eax
        entry->delivery_mode            = msg.arch_data.delivery_mode;
    1205:       0f b6 53 01             movzbl 0x1(%rbx),%edx
    1209:       0f b6 74 24 09          movzbl 0x9(%rsp),%esi
        entry->vector                   = msg.arch_data.vector;
    120e:       88 03                   mov    %al,(%rbx)
        entry->dest_mode_logical        = msg.arch_addr_lo.dest_mode_logical;
    1210:       0f b6 04 24             movzbl (%rsp),%eax
        entry->delivery_mode            = msg.arch_data.delivery_mode;
    1214:       83 e2 f0                and    $0xfffffff0,%edx
    1217:       83 e6 07                and    $0x7,%esi
        entry->dest_mode_logical        = msg.arch_addr_lo.dest_mode_logical;
    121a:       09 f2                   or     %esi,%edx
    121c:       8d 0c 00                lea    (%rax,%rax,1),%ecx
        entry->ir_format                = msg.arch_addr_lo.dmar_format;
    121f:       c0 e8 04                shr    $0x4,%al
        entry->dest_mode_logical        = msg.arch_addr_lo.dest_mode_logical;
    1222:       83 e1 08                and    $0x8,%ecx
    1225:       09 ca                   or     %ecx,%edx
    1227:       88 53 01                mov    %dl,0x1(%rbx)
        entry->ir_format                = msg.arch_addr_lo.dmar_format;
    122a:       89 c2                   mov    %eax,%edx
        entry->ir_index_0_14            = msg.arch_addr_lo.dmar_index_0_14;
    122c:       8b 04 24                mov    (%rsp),%eax
    122f:       83 e2 01                and    $0x1,%edx
    1232:       c1 e8 05                shr    $0x5,%eax
    1235:       66 25 ff 7f             and    $0x7fff,%ax
    1239:       8d 0c 00                lea    (%rax,%rax,1),%ecx
    123c:       66 c1 e8 07             shr    $0x7,%ax
    1240:       88 43 07                mov    %al,0x7(%rbx)
    1243:       09 ca                   or     %ecx,%edx
}
    1245:       48 8b 44 24 10          mov    0x10(%rsp),%rax
    124a:       65 48 33 04 25 28 00    xor    %gs:0x28,%rax
    1251:       00 00 
        entry->ir_index_0_14            = msg.arch_addr_lo.dmar_index_0_14;
    1253:       88 53 06                mov    %dl,0x6(%rbx)
}
    1256:       75 06                   jne    125e <ioapic_setup_msg_from_msi+0x7e>
    1258:       48 83 c4 18             add    $0x18,%rsp
    125c:       5b                      pop    %rbx
    125d:       c3                      retq   
    125e:       e8 00 00 00 00          callq  1263 <ioapic_setup_msg_from_msi+0x83>
    1263:       0f 1f 00                nopl   (%rax)
    1266:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
    126d:       00 00 00 


Still, this isn't really a fast path so I'm content to file the GCC PR
for the really *obvious* misses and let it take its course.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v3 17/35] x86/pci/xen: Use msi_msg shadow structs
  2020-10-25 10:26                         ` David Woodhouse
@ 2020-10-25 13:20                           ` David Laight
  0 siblings, 0 replies; 192+ messages in thread
From: David Laight @ 2020-10-25 13:20 UTC (permalink / raw)
  To: 'David Woodhouse', x86
  Cc: kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini, linux-kernel,
	linux-hyperv, maz, Dexuan Cui

From: David Woodhouse
> Sent: 25 October 2020 10:26
> To: David Laight <David.Laight@ACULAB.COM>; x86@kernel.org
> 
> On Sun, 2020-10-25 at 09:49 +0000, David Laight wrote:
> > Just looking at a random one of these patches...
> >
> > Does the compiler manage to optimise that reasonably?
> > Or does it generate a lot of shifts and masks as each
> > bitfield is set?
> >
> > The code generation for bitfields is often a lot worse
> > that that for |= setting bits in a word.
> 
> Indeed, it appears to be utterly appalling. That was one of my
> motivations for doing it with masks and shifts in the first place.

I thought it would be.

I'm not even sure using bitfields to map hardware registers
makes the code any more readable (or fool proof).

I suspect using #define constants and explicit |= and &= ~
is actually best - provided the names are descriptive enough.

If you set all the fields the compiler will merge the values
and do a single write - provided nothing you read might
alias the target.

The only way to get strongly typed integers is to cast them
to a structure pointer type.
Together with a static inline function to remove the casts
and | the values together it might make things fool proof.
But I've never tried it.

ISTR once doing something like that with error codes, but
it didn't ever make source control.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 15/35] PCI: vmd: Use msi_msg shadow structs
  2020-10-24 21:35                     ` [PATCH v3 15/35] PCI: vmd: " David Woodhouse
@ 2020-10-28 20:49                       ` Kees Cook
  2020-10-28 21:13                         ` Thomas Gleixner
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
  1 sibling, 1 reply; 192+ messages in thread
From: Kees Cook @ 2020-10-28 20:49 UTC (permalink / raw)
  To: David Woodhouse
  Cc: x86, kvm, iommu, joro, Thomas Gleixner, Paolo Bonzini,
	linux-kernel, linux-hyperv, maz, Dexuan Cui

On Sat, Oct 24, 2020 at 10:35:15PM +0100, David Woodhouse wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Use the x86 shadow structs in msi_msg instead of the macros.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  drivers/pci/controller/vmd.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> index aa1b12bac9a1..72de3c6f644e 100644
> --- a/drivers/pci/controller/vmd.c
> +++ b/drivers/pci/controller/vmd.c
> @@ -18,7 +18,6 @@
>  #include <asm/irqdomain.h>
>  #include <asm/device.h>
>  #include <asm/msi.h>
> -#include <asm/msidef.h>
>  
>  #define VMD_CFGBAR	0
>  #define VMD_MEMBAR1	2
> @@ -131,10 +130,10 @@ static void vmd_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
>  	struct vmd_irq_list *irq = vmdirq->irq;
>  	struct vmd_dev *vmd = irq_data_get_irq_handler_data(data);
>  
> -	msg->address_hi = MSI_ADDR_BASE_HI;
> -	msg->address_lo = MSI_ADDR_BASE_LO |
> -			  MSI_ADDR_DEST_ID(index_from_irqs(vmd, irq));
> -	msg->data = 0;
> +	memset(&msg, 0, sizeof(*msg);

This should be:

+	memset(msg, 0, sizeof(*msg);

https://groups.google.com/g/clang-built-linux/c/N-DfCPz3alg

> +	msg->address_hi = X86_MSI_BASE_ADDRESS_HIGH;
> +	msg->arch_addr_lo.base_address = X86_MSI_BASE_ADDRESS_LOW;
> +	msg->arch_addr_lo.destid_0_7 = index_from_irqs(vmd, irq);
>  }
>  
>  /*
> -- 
> 2.26.2
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 15/35] PCI: vmd: Use msi_msg shadow structs
  2020-10-28 20:49                       ` Kees Cook
@ 2020-10-28 21:13                         ` Thomas Gleixner
  2020-10-28 23:22                           ` Kees Cook
  0 siblings, 1 reply; 192+ messages in thread
From: Thomas Gleixner @ 2020-10-28 21:13 UTC (permalink / raw)
  To: Kees Cook, David Woodhouse
  Cc: x86, kvm, iommu, joro, Paolo Bonzini, linux-kernel, linux-hyperv,
	maz, Dexuan Cui

On Wed, Oct 28 2020 at 13:49, Kees Cook wrote:
> On Sat, Oct 24, 2020 at 10:35:15PM +0100, David Woodhouse wrote:
>> +	memset(&msg, 0, sizeof(*msg);
>
> This should be:
>
> +	memset(msg, 0, sizeof(*msg);

        memset(msg, 0, sizeof(*msg));

Then it compiles _and_ is correct :)

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 15/35] PCI: vmd: Use msi_msg shadow structs
  2020-10-28 21:13                         ` Thomas Gleixner
@ 2020-10-28 23:22                           ` Kees Cook
  0 siblings, 0 replies; 192+ messages in thread
From: Kees Cook @ 2020-10-28 23:22 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: David Woodhouse, x86, kvm, iommu, joro, Paolo Bonzini,
	linux-kernel, linux-hyperv, maz, Dexuan Cui

On Wed, Oct 28, 2020 at 10:13:52PM +0100, Thomas Gleixner wrote:
> On Wed, Oct 28 2020 at 13:49, Kees Cook wrote:
> > On Sat, Oct 24, 2020 at 10:35:15PM +0100, David Woodhouse wrote:
> >> +	memset(&msg, 0, sizeof(*msg);
> >
> > This should be:
> >
> > +	memset(msg, 0, sizeof(*msg);
> 
>         memset(msg, 0, sizeof(*msg));
> 
> Then it compiles _and_ is correct :)

\o/ ;)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 192+ messages in thread

* [tip: x86/apic] x86/kvm: Enable 15-bit extension when KVM_FEATURE_MSI_EXT_DEST_ID detected
  2020-10-24 21:35                     ` [PATCH v3 35/35] x86/kvm: Enable 15-bit extension when KVM_FEATURE_MSI_EXT_DEST_ID detected David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for David Woodhouse
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for David Woodhouse @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: David Woodhouse, Thomas Gleixner, Paolo Bonzini, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     2e008ffe426f927b1697adb4ed10c1e419927ae4
Gitweb:        https://git.kernel.org/tip/2e008ffe426f927b1697adb4ed10c1e419927ae4
Author:        David Woodhouse <dwmw@amazon.co.uk>
AuthorDate:    Sat, 24 Oct 2020 22:35:35 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:33 +01:00

x86/kvm: Enable 15-bit extension when KVM_FEATURE_MSI_EXT_DEST_ID detected

This allows the host to indicate that MSI emulation supports 15-bit
destination IDs, allowing up to 32768 CPUs without interrupt remapping.

cf. https://patchwork.kernel.org/patch/11816693/ for qemu

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20201024213535.443185-36-dwmw2@infradead.org

---
 arch/x86/kernel/kvm.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 7f57ede..5e78e01 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -740,6 +740,11 @@ static void __init kvm_apic_init(void)
 #endif
 }
 
+static bool __init kvm_msi_ext_dest_id(void)
+{
+	return kvm_para_has_feature(KVM_FEATURE_MSI_EXT_DEST_ID);
+}
+
 static void __init kvm_init_platform(void)
 {
 	kvmclock_init();
@@ -769,6 +774,7 @@ const __initconst struct hypervisor_x86 x86_hyper_kvm = {
 	.type				= X86_HYPER_KVM,
 	.init.guest_late_init		= kvm_guest_init,
 	.init.x2apic_available		= kvm_para_available,
+	.init.msi_ext_dest_id		= kvm_msi_ext_dest_id,
 	.init.init_platform		= kvm_init_platform,
 #if defined(CONFIG_AMD_MEM_ENCRYPT)
 	.runtime.sev_es_hcall_prepare	= kvm_sev_es_hcall_prepare,

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] iommu/hyper-v: Disable IRQ pseudo-remapping if 15 bit APIC IDs are available
  2020-10-24 21:35                     ` [PATCH v3 33/35] iommu/hyper-v: Disable IRQ pseudo-remapping if 15 bit APIC IDs are available David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for David Woodhouse
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for David Woodhouse @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: David Woodhouse, Thomas Gleixner, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     bf27ef8a77d8da38c9f35f8f6aab013a2dcf175f
Gitweb:        https://git.kernel.org/tip/bf27ef8a77d8da38c9f35f8f6aab013a2dcf175f
Author:        David Woodhouse <dwmw@amazon.co.uk>
AuthorDate:    Sat, 24 Oct 2020 22:35:33 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:31 +01:00

iommu/hyper-v: Disable IRQ pseudo-remapping if 15 bit APIC IDs are available

If the 15-bit APIC ID support is present in emulated MSI then there's no
need for the pseudo-remapping support.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-34-dwmw2@infradead.org

---
 drivers/iommu/hyperv-iommu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index a629a6b..9438daa 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -121,6 +121,7 @@ static int __init hyperv_prepare_irq_remapping(void)
 	int i;
 
 	if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
+	    x86_init.hyper.msi_ext_dest_id() ||
 	    !x2apic_supported())
 		return -ENODEV;
 

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] x86/apic: Support 15 bits of APIC ID in MSI where available
  2020-10-24 21:35                     ` [PATCH v3 32/35] x86/apic: Support 15 bits of APIC ID in MSI where available David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for David Woodhouse
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for David Woodhouse @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: David Woodhouse, Thomas Gleixner, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     ab0f59c6f135289c7ea90b0e2471674bf289d884
Gitweb:        https://git.kernel.org/tip/ab0f59c6f135289c7ea90b0e2471674bf289d884
Author:        David Woodhouse <dwmw@amazon.co.uk>
AuthorDate:    Sat, 24 Oct 2020 22:35:32 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:29 +01:00

x86/apic: Support 15 bits of APIC ID in MSI where available

Some hypervisors can allow the guest to use the Extended Destination ID
field in the MSI address to address up to 32768 CPUs.

This applies to all downstream devices which generate MSI cycles,
including HPET, I/O-APIC and PCI MSI.

HPET and PCI MSI use the same __irq_msi_compose_msg() function, while
I/O-APIC generates its own and had support for the extended bits added in
a previous commit.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-33-dwmw2@infradead.org

---
 arch/x86/include/asm/msi.h      |  3 ++-
 arch/x86/include/asm/x86_init.h |  2 ++
 arch/x86/kernel/apic/apic.c     | 26 ++++++++++++++++++++++++--
 arch/x86/kernel/x86_init.c      |  1 +
 4 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/msi.h b/arch/x86/include/asm/msi.h
index 322fd90..b85147d 100644
--- a/arch/x86/include/asm/msi.h
+++ b/arch/x86/include/asm/msi.h
@@ -29,7 +29,8 @@ typedef struct x86_msi_addr_lo {
 			u32	reserved_0		:  2,
 				dest_mode_logical	:  1,
 				redirect_hint		:  1,
-				reserved_1		:  8,
+				reserved_1		:  1,
+				virt_destid_8_14	:  7,
 				destid_0_7		:  8,
 				base_address		: 12;
 		};
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index dde5b3f..5c69f7e 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -116,6 +116,7 @@ struct x86_init_pci {
  * @init_platform:		platform setup
  * @guest_late_init:		guest late init
  * @x2apic_available:		X2APIC detection
+ * @msi_ext_dest_id:		MSI supports 15-bit APIC IDs
  * @init_mem_mapping:		setup early mappings during init_mem_mapping()
  * @init_after_bootmem:		guest init after boot allocator is finished
  */
@@ -123,6 +124,7 @@ struct x86_hyper_init {
 	void (*init_platform)(void);
 	void (*guest_late_init)(void);
 	bool (*x2apic_available)(void);
+	bool (*msi_ext_dest_id)(void);
 	void (*init_mem_mapping)(void);
 	void (*init_after_bootmem)(void);
 };
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index f7196ee..6bd20c0 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -94,6 +94,11 @@ static unsigned int disabled_cpu_apicid __ro_after_init = BAD_APICID;
 static int apic_extnmi __ro_after_init = APIC_EXTNMI_BSP;
 
 /*
+ * Hypervisor supports 15 bits of APIC ID in MSI Extended Destination ID
+ */
+static bool virt_ext_dest_id __ro_after_init;
+
+/*
  * Map cpu index to physical APIC ID
  */
 DEFINE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_cpu_to_apicid, BAD_APICID);
@@ -1841,6 +1846,8 @@ static __init void try_to_enable_x2apic(int remap_mode)
 		return;
 
 	if (remap_mode != IRQ_REMAP_X2APIC_MODE) {
+		u32 apic_limit = 255;
+
 		/*
 		 * Using X2APIC without IR is not architecturally supported
 		 * on bare metal but may be supported in guests.
@@ -1852,11 +1859,21 @@ static __init void try_to_enable_x2apic(int remap_mode)
 		}
 
 		/*
+		 * If the hypervisor supports extended destination ID in
+		 * MSI, that increases the maximum APIC ID that can be
+		 * used for non-remapped IRQ domains.
+		 */
+		if (x86_init.hyper.msi_ext_dest_id()) {
+			virt_ext_dest_id = 1;
+			apic_limit = 32767;
+		}
+
+		/*
 		 * Without IR, all CPUs can be addressed by IOAPIC/MSI only
 		 * in physical mode, and CPUs with an APIC ID that cannnot
 		 * be addressed must not be brought online.
 		 */
-		x2apic_set_max_apicid(255);
+		x2apic_set_max_apicid(apic_limit);
 		x2apic_phys = 1;
 	}
 	x2apic_enable();
@@ -2497,10 +2514,15 @@ void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
 	 * Only the IOMMU itself can use the trick of putting destination
 	 * APIC ID into the high bits of the address. Anything else would
 	 * just be writing to memory if it tried that, and needs IR to
-	 * address higher APIC IDs.
+	 * address APICs which can't be addressed in the normal 32-bit
+	 * address range at 0xFFExxxxx. That is typically just 8 bits, but
+	 * some hypervisors allow the extended destination ID field in bits
+	 * 5-11 to be used, giving support for 15 bits of APIC IDs in total.
 	 */
 	if (dmar)
 		msg->arch_addr_hi.destid_8_31 = cfg->dest_apicid >> 8;
+	else if (virt_ext_dest_id && cfg->dest_apicid < 0x8000)
+		msg->arch_addr_lo.virt_destid_8_14 = cfg->dest_apicid >> 8;
 	else
 		WARN_ON_ONCE(cfg->dest_apicid > 0xFF);
 }
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index a3038d8..8b39582 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -110,6 +110,7 @@ struct x86_init_ops x86_init __initdata = {
 		.init_platform		= x86_init_noop,
 		.guest_late_init	= x86_init_noop,
 		.x2apic_available	= bool_x86_init_noop,
+		.msi_ext_dest_id	= bool_x86_init_noop,
 		.init_mem_mapping	= x86_init_noop,
 		.init_after_bootmem	= x86_init_noop,
 	},

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] x86/ioapic: Handle Extended Destination ID field in RTE
  2020-10-24 21:35                     ` [PATCH v3 31/35] x86/ioapic: Handle Extended Destination ID field in RTE David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for David Woodhouse
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for David Woodhouse @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: David Woodhouse, Thomas Gleixner, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     51130d21881d435fad5fa7f25bea77aa0ffc9a4e
Gitweb:        https://git.kernel.org/tip/51130d21881d435fad5fa7f25bea77aa0ffc9a4e
Author:        David Woodhouse <dwmw@amazon.co.uk>
AuthorDate:    Sat, 24 Oct 2020 22:35:31 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:28 +01:00

x86/ioapic: Handle Extended Destination ID field in RTE

Bits 63-48 of the I/OAPIC Redirection Table Entry map directly to bits 19-4
of the address used in the resulting MSI cycle.

Historically, the x86 MSI format only used the top 8 of those 16 bits as
the destination APIC ID, and the "Extended Destination ID" in the lower 8
bits was unused.

With interrupt remapping, the lowest bit of the Extended Destination ID
(bit 48 of RTE, bit 4 of MSI address) is now used to indicate a remappable
format MSI.

A hypervisor can use the other 7 bits of the Extended Destination ID to
permit guests to address up to 15 bits of APIC IDs, thus allowing 32768
vCPUs before having to expose a vIOMMU and interrupt remapping to the
guest.

No behavioural change in this patch, since nothing yet permits APIC IDs
above 255 to be used with the non-IR I/OAPIC domain.

[ tglx: Converted it to the cleaned up entry/msi_msg format and added
  	commentry ]

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-32-dwmw2@infradead.org

---
 arch/x86/include/asm/io_apic.h |  3 ++-
 arch/x86/kernel/apic/io_apic.c | 20 +++++++++++++++-----
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 73da644..437aa8d 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -67,7 +67,8 @@ struct IO_APIC_route_entry {
 				is_level		:  1,
 				masked			:  1,
 				reserved_0		: 15,
-				reserved_1		: 24,
+				reserved_1		: 17,
+				virt_destid_8_14	:  7,
 				destid_0_7		:  8;
 		};
 		struct {
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 443d2c9..1cfd65e 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1238,9 +1238,10 @@ static void io_apic_print_entries(unsigned int apic, unsigned int nr_entries)
 			       (entry.ir_index_15 << 15) | entry.ir_index_0_14,
 				entry.ir_zero);
 		} else {
-			printk(KERN_DEBUG "%s, %s, D(%02X), M(%1d)\n", buf,
+			printk(KERN_DEBUG "%s, %s, D(%02X%02X), M(%1d)\n", buf,
 			       entry.dest_mode_logical ? "logical " : "physical",
-			       entry.destid_0_7, entry.delivery_mode);
+			       entry.virt_destid_8_14, entry.destid_0_7,
+			       entry.delivery_mode);
 		}
 	}
 }
@@ -1409,6 +1410,7 @@ void native_restore_boot_irq_mode(void)
 	 */
 	if (ioapic_i8259.pin != -1) {
 		struct IO_APIC_route_entry entry;
+		u32 apic_id = read_apic_id();
 
 		memset(&entry, 0, sizeof(entry));
 		entry.masked		= false;
@@ -1416,7 +1418,8 @@ void native_restore_boot_irq_mode(void)
 		entry.active_low	= false;
 		entry.dest_mode_logical	= false;
 		entry.delivery_mode	= APIC_DELIVERY_MODE_EXTINT;
-		entry.destid_0_7	= read_apic_id();
+		entry.destid_0_7	= apic_id & 0xFF;
+		entry.virt_destid_8_14	= apic_id >> 8;
 
 		/*
 		 * Add it to the IO-APIC irq-routing table:
@@ -1885,7 +1888,11 @@ static void ioapic_setup_msg_from_msi(struct irq_data *irq_data,
 	/* DMAR/IR: 1, 0 for all other modes */
 	entry->ir_format		= msg.arch_addr_lo.dmar_format;
 	/*
-	 * DMAR/IR: index bit 0-14.
+	 * - DMAR/IR: index bit 0-14.
+	 *
+	 * - Virt: If the host supports x2apic without a virtualized IR
+	 *	   unit then bit 0-6 of dmar_index_0_14 are providing bit
+	 *	   8-14 of the destination id.
 	 *
 	 * All other modes have bit 0-6 of dmar_index_0_14 cleared and the
 	 * topmost 8 bits are destination id bit 0-7 (entry::destid_0_7).
@@ -2063,6 +2070,7 @@ static inline void __init unlock_ExtINT_logic(void)
 	int apic, pin, i;
 	struct IO_APIC_route_entry entry0, entry1;
 	unsigned char save_control, save_freq_select;
+	u32 apic_id;
 
 	pin  = find_isa_irq_pin(8, mp_INT);
 	if (pin == -1) {
@@ -2078,11 +2086,13 @@ static inline void __init unlock_ExtINT_logic(void)
 	entry0 = ioapic_read_entry(apic, pin);
 	clear_IO_APIC_pin(apic, pin);
 
+	apic_id = hard_smp_processor_id();
 	memset(&entry1, 0, sizeof(entry1));
 
 	entry1.dest_mode_logical	= true;
 	entry1.masked			= false;
-	entry1.destid_0_7		= hard_smp_processor_id();
+	entry1.destid_0_7		= apic_id & 0xFF;
+	entry1.virt_destid_8_14		= apic_id >> 8;
 	entry1.delivery_mode		= APIC_DELIVERY_MODE_EXTINT;
 	entry1.active_low		= entry0.active_low;
 	entry1.is_level			= false;

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] x86: Kill all traces of irq_remapping_get_irq_domain()
  2020-10-24 21:35                     ` [PATCH v3 29/35] x86: Kill all traces of irq_remapping_get_irq_domain() David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for David Woodhouse
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for David Woodhouse @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: David Woodhouse, Thomas Gleixner, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     ed381fca47122f0787ee53b97e5f9d562eec7237
Gitweb:        https://git.kernel.org/tip/ed381fca47122f0787ee53b97e5f9d562eec7237
Author:        David Woodhouse <dwmw@amazon.co.uk>
AuthorDate:    Sat, 24 Oct 2020 22:35:29 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:28 +01:00

x86: Kill all traces of irq_remapping_get_irq_domain()

All users are converted to use the fwspec based parent domain lookup.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-30-dwmw2@infradead.org

---
 arch/x86/include/asm/hw_irq.h        |  2 +--
 arch/x86/include/asm/irq_remapping.h |  9 +-------
 drivers/iommu/amd/iommu.c            | 34 +---------------------------
 drivers/iommu/hyperv-iommu.c         |  9 +-------
 drivers/iommu/intel/irq_remapping.c  | 17 +--------------
 drivers/iommu/irq_remapping.c        | 14 +-----------
 drivers/iommu/irq_remapping.h        |  3 +--
 7 files changed, 88 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 83a69f6..458f5a6 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -40,8 +40,6 @@ enum irq_alloc_type {
 	X86_IRQ_ALLOC_TYPE_PCI_MSIX,
 	X86_IRQ_ALLOC_TYPE_DMAR,
 	X86_IRQ_ALLOC_TYPE_UV,
-	X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT,
-	X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT,
 };
 
 struct ioapic_alloc_info {
diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index af4a151..7cc4943 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -44,9 +44,6 @@ extern int irq_remapping_reenable(int);
 extern int irq_remap_enable_fault_handling(void);
 extern void panic_if_irq_remap(const char *msg);
 
-extern struct irq_domain *
-irq_remapping_get_irq_domain(struct irq_alloc_info *info);
-
 /* Create PCI MSI/MSIx irqdomain, use @parent as the parent irqdomain. */
 extern struct irq_domain *
 arch_create_remap_msi_irq_domain(struct irq_domain *par, const char *n, int id);
@@ -71,11 +68,5 @@ static inline void panic_if_irq_remap(const char *msg)
 {
 }
 
-static inline struct irq_domain *
-irq_remapping_get_irq_domain(struct irq_alloc_info *info)
-{
-	return NULL;
-}
-
 #endif /* CONFIG_IRQ_REMAP */
 #endif /* __X86_IRQ_REMAPPING_H */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 31b2224..463d322 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3601,10 +3601,8 @@ static int get_devid(struct irq_alloc_info *info)
 {
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC:
-	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
 		return get_ioapic_devid(info->devid);
 	case X86_IRQ_ALLOC_TYPE_HPET:
-	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
 		return get_hpet_devid(info->devid);
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
@@ -3615,44 +3613,12 @@ static int get_devid(struct irq_alloc_info *info)
 	}
 }
 
-static struct irq_domain *get_irq_domain_for_devid(struct irq_alloc_info *info,
-						   int devid)
-{
-	struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
-
-	if (!iommu)
-		return NULL;
-
-	switch (info->type) {
-	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
-	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
-		return iommu->ir_domain;
-	default:
-		WARN_ON_ONCE(1);
-		return NULL;
-	}
-}
-
-static struct irq_domain *get_irq_domain(struct irq_alloc_info *info)
-{
-	int devid;
-
-	if (!info)
-		return NULL;
-
-	devid = get_devid(info);
-	if (devid < 0)
-		return NULL;
-	return get_irq_domain_for_devid(info, devid);
-}
-
 struct irq_remap_ops amd_iommu_irq_ops = {
 	.prepare		= amd_iommu_prepare,
 	.enable			= amd_iommu_enable,
 	.disable		= amd_iommu_disable,
 	.reenable		= amd_iommu_reenable,
 	.enable_faulting	= amd_iommu_enable_faulting,
-	.get_irq_domain		= get_irq_domain,
 };
 
 static void fill_msi_msg(struct msi_msg *msg, u32 index)
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 78a264a..a629a6b 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -160,18 +160,9 @@ static int __init hyperv_enable_irq_remapping(void)
 	return IRQ_REMAP_X2APIC_MODE;
 }
 
-static struct irq_domain *hyperv_get_irq_domain(struct irq_alloc_info *info)
-{
-	if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT)
-		return ioapic_ir_domain;
-	else
-		return NULL;
-}
-
 struct irq_remap_ops hyperv_irq_remap_ops = {
 	.prepare		= hyperv_prepare_irq_remapping,
 	.enable			= hyperv_enable_irq_remapping,
-	.get_irq_domain		= hyperv_get_irq_domain,
 };
 
 #endif
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index b3b079c..bca4401 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1127,29 +1127,12 @@ static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
 	irte->redir_hint = 1;
 }
 
-static struct irq_domain *intel_get_irq_domain(struct irq_alloc_info *info)
-{
-	if (!info)
-		return NULL;
-
-	switch (info->type) {
-	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
-		return map_ioapic_to_ir(info->devid);
-	case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
-		return map_hpet_to_ir(info->devid);
-	default:
-		WARN_ON_ONCE(1);
-		return NULL;
-	}
-}
-
 struct irq_remap_ops intel_irq_remap_ops = {
 	.prepare		= intel_prepare_irq_remapping,
 	.enable			= intel_enable_irq_remapping,
 	.disable		= disable_irq_remapping,
 	.reenable		= reenable_irq_remapping,
 	.enable_faulting	= enable_drhd_fault_handling,
-	.get_irq_domain		= intel_get_irq_domain,
 };
 
 static void intel_ir_reconfigure_irte(struct irq_data *irqd, bool force)
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index 2d84b1e..83314b9 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -158,17 +158,3 @@ void panic_if_irq_remap(const char *msg)
 	if (irq_remapping_enabled)
 		panic(msg);
 }
-
-/**
- * irq_remapping_get_irq_domain - Get the irqdomain serving the request @info
- * @info: interrupt allocation information, used to identify the IOMMU device
- *
- * Returns pointer to IRQ domain, or NULL on failure.
- */
-struct irq_domain *irq_remapping_get_irq_domain(struct irq_alloc_info *info)
-{
-	if (!remap_ops || !remap_ops->get_irq_domain)
-		return NULL;
-
-	return remap_ops->get_irq_domain(info);
-}
diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
index 1661b3d..8c89cb9 100644
--- a/drivers/iommu/irq_remapping.h
+++ b/drivers/iommu/irq_remapping.h
@@ -42,9 +42,6 @@ struct irq_remap_ops {
 
 	/* Enable fault handling */
 	int  (*enable_faulting)(void);
-
-	/* Get the irqdomain associated to IOMMU device */
-	struct irq_domain *(*get_irq_domain)(struct irq_alloc_info *);
 };
 
 extern struct irq_remap_ops intel_irq_remap_ops;

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] iommu/vt-d: Simplify intel_irq_remapping_select()
  2020-10-24 21:35                     ` [PATCH v3 30/35] iommu/vt-d: Simplify intel_irq_remapping_select() David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for David Woodhouse
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for David Woodhouse @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: David Woodhouse, Thomas Gleixner, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     79eb3581bcaae9b5677629d945e14da212aa76e2
Gitweb:        https://git.kernel.org/tip/79eb3581bcaae9b5677629d945e14da212aa76e2
Author:        David Woodhouse <dwmw@amazon.co.uk>
AuthorDate:    Sat, 24 Oct 2020 22:35:30 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:28 +01:00

iommu/vt-d: Simplify intel_irq_remapping_select()

Now that the old get_irq_domain() method has gone, consolidate on just the
map_XXX_to_iommu() functions.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-31-dwmw2@infradead.org

---
 drivers/iommu/intel/irq_remapping.c | 19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index bca4401..aeffda9 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -203,13 +203,13 @@ static int modify_irte(struct irq_2_iommu *irq_iommu,
 	return rc;
 }
 
-static struct irq_domain *map_hpet_to_ir(u8 hpet_id)
+static struct intel_iommu *map_hpet_to_iommu(u8 hpet_id)
 {
 	int i;
 
 	for (i = 0; i < MAX_HPET_TBS; i++) {
 		if (ir_hpet[i].id == hpet_id && ir_hpet[i].iommu)
-			return ir_hpet[i].iommu->ir_domain;
+			return ir_hpet[i].iommu;
 	}
 	return NULL;
 }
@@ -225,13 +225,6 @@ static struct intel_iommu *map_ioapic_to_iommu(int apic)
 	return NULL;
 }
 
-static struct irq_domain *map_ioapic_to_ir(int apic)
-{
-	struct intel_iommu *iommu = map_ioapic_to_iommu(apic);
-
-	return iommu ? iommu->ir_domain : NULL;
-}
-
 static struct irq_domain *map_dev_to_ir(struct pci_dev *dev)
 {
 	struct dmar_drhd_unit *drhd = dmar_find_matched_drhd_unit(dev);
@@ -1418,12 +1411,14 @@ static int intel_irq_remapping_select(struct irq_domain *d,
 				      struct irq_fwspec *fwspec,
 				      enum irq_domain_bus_token bus_token)
 {
+	struct intel_iommu *iommu = NULL;
+
 	if (x86_fwspec_is_ioapic(fwspec))
-		return d == map_ioapic_to_ir(fwspec->param[0]);
+		iommu = map_ioapic_to_iommu(fwspec->param[0]);
 	else if (x86_fwspec_is_hpet(fwspec))
-		return d == map_hpet_to_ir(fwspec->param[0]);
+		iommu = map_hpet_to_iommu(fwspec->param[0]);
 
-	return 0;
+	return iommu && d == iommu->ir_domain;
 }
 
 static const struct irq_domain_ops intel_ir_domain_ops = {

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain
  2020-10-24 21:35                     ` [PATCH v3 28/35] x86/ioapic: " David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for David Woodhouse
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for David Woodhouse @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: David Woodhouse, Thomas Gleixner, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     b643128b917ca8f1c8b1e14af64ebdc81147b2d1
Gitweb:        https://git.kernel.org/tip/b643128b917ca8f1c8b1e14af64ebdc81147b2d1
Author:        David Woodhouse <dwmw@amazon.co.uk>
AuthorDate:    Sat, 24 Oct 2020 22:35:28 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:28 +01:00

x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain

All possible parent domains have a select method now. Make use of it.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-29-dwmw2@infradead.org

---
 arch/x86/kernel/apic/io_apic.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index ea983da..443d2c9 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2320,36 +2320,37 @@ out:
 
 static int mp_irqdomain_create(int ioapic)
 {
-	struct irq_alloc_info info;
 	struct irq_domain *parent;
 	int hwirqs = mp_ioapic_pin_count(ioapic);
 	struct ioapic *ip = &ioapics[ioapic];
 	struct ioapic_domain_cfg *cfg = &ip->irqdomain_cfg;
 	struct mp_ioapic_gsi *gsi_cfg = mp_ioapic_gsi_routing(ioapic);
 	struct fwnode_handle *fn;
-	char *name = "IO-APIC";
+	struct irq_fwspec fwspec;
 
 	if (cfg->type == IOAPIC_DOMAIN_INVALID)
 		return 0;
 
-	init_irq_alloc_info(&info, NULL);
-	info.type = X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT;
-	info.devid = mpc_ioapic_id(ioapic);
-	parent = irq_remapping_get_irq_domain(&info);
-	if (!parent)
-		parent = x86_vector_domain;
-	else
-		name = "IO-APIC-IR";
-
 	/* Handle device tree enumerated APICs proper */
 	if (cfg->dev) {
 		fn = of_node_to_fwnode(cfg->dev);
 	} else {
-		fn = irq_domain_alloc_named_id_fwnode(name, ioapic);
+		fn = irq_domain_alloc_named_id_fwnode("IO-APIC", ioapic);
 		if (!fn)
 			return -ENOMEM;
 	}
 
+	fwspec.fwnode = fn;
+	fwspec.param_count = 1;
+	fwspec.param[0] = ioapic;
+
+	parent = irq_find_matching_fwspec(&fwspec, DOMAIN_BUS_ANY);
+	if (!parent) {
+		if (!cfg->dev)
+			irq_domain_free_fwnode(fn);
+		return -ENODEV;
+	}
+
 	ip->irqdomain = irq_domain_create_linear(fn, hwirqs, cfg->ops,
 						 (void *)(long)ioapic);
 

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] iommu/hyper-v: Implement select() method on remapping irqdomain
  2020-10-24 21:35                     ` [PATCH v3 26/35] iommu/hyper-v: " David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for David Woodhouse
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for David Woodhouse @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: David Woodhouse, Thomas Gleixner, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     a491bb19f728cdb8cc1f4734ecc57c0afa099fac
Gitweb:        https://git.kernel.org/tip/a491bb19f728cdb8cc1f4734ecc57c0afa099fac
Author:        David Woodhouse <dwmw@amazon.co.uk>
AuthorDate:    Sat, 24 Oct 2020 22:35:26 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:28 +01:00

iommu/hyper-v: Implement select() method on remapping irqdomain

Preparatory for removing irq_remapping_get_irq_domain()

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-27-dwmw2@infradead.org

---
 drivers/iommu/hyperv-iommu.c |  9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 37dd485..78a264a 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -101,7 +101,16 @@ static void hyperv_irq_remapping_free(struct irq_domain *domain,
 	irq_domain_free_irqs_common(domain, virq, nr_irqs);
 }
 
+static int hyperv_irq_remapping_select(struct irq_domain *d,
+				       struct irq_fwspec *fwspec,
+				       enum irq_domain_bus_token bus_token)
+{
+	/* Claim only the first (and only) I/OAPIC */
+	return x86_fwspec_is_ioapic(fwspec) && fwspec->param[0] == 0;
+}
+
 static const struct irq_domain_ops hyperv_ir_domain_ops = {
+	.select = hyperv_irq_remapping_select,
 	.alloc = hyperv_irq_remapping_alloc,
 	.free = hyperv_irq_remapping_free,
 };

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] x86/hpet: Use irq_find_matching_fwspec() to find remapping irqdomain
  2020-10-24 21:35                     ` [PATCH v3 27/35] x86/hpet: Use irq_find_matching_fwspec() to find " David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for David Woodhouse
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for David Woodhouse @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: David Woodhouse, Thomas Gleixner, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     c2a5881c28e5bb4cb901029423a1c7068c0afa2d
Gitweb:        https://git.kernel.org/tip/c2a5881c28e5bb4cb901029423a1c7068c0afa2d
Author:        David Woodhouse <dwmw@amazon.co.uk>
AuthorDate:    Sat, 24 Oct 2020 22:35:27 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:28 +01:00

x86/hpet: Use irq_find_matching_fwspec() to find remapping irqdomain

All possible parent domains have a select method now. Make use of it.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-28-dwmw2@infradead.org

---
 arch/x86/kernel/hpet.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 3b8b127..08651a4 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -543,8 +543,8 @@ static struct irq_domain *hpet_create_irq_domain(int hpet_id)
 {
 	struct msi_domain_info *domain_info;
 	struct irq_domain *parent, *d;
-	struct irq_alloc_info info;
 	struct fwnode_handle *fn;
+	struct irq_fwspec fwspec;
 
 	if (x86_vector_domain == NULL)
 		return NULL;
@@ -556,15 +556,6 @@ static struct irq_domain *hpet_create_irq_domain(int hpet_id)
 	*domain_info = hpet_msi_domain_info;
 	domain_info->data = (void *)(long)hpet_id;
 
-	init_irq_alloc_info(&info, NULL);
-	info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
-	info.devid = hpet_id;
-	parent = irq_remapping_get_irq_domain(&info);
-	if (parent == NULL)
-		parent = x86_vector_domain;
-	else
-		hpet_msi_controller.name = "IR-HPET-MSI";
-
 	fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
 					      hpet_id);
 	if (!fn) {
@@ -572,6 +563,19 @@ static struct irq_domain *hpet_create_irq_domain(int hpet_id)
 		return NULL;
 	}
 
+	fwspec.fwnode = fn;
+	fwspec.param_count = 1;
+	fwspec.param[0] = hpet_id;
+
+	parent = irq_find_matching_fwspec(&fwspec, DOMAIN_BUS_ANY);
+	if (!parent) {
+		irq_domain_free_fwnode(fn);
+		kfree(domain_info);
+		return NULL;
+	}
+	if (parent != x86_vector_domain)
+		hpet_msi_controller.name = "IR-HPET-MSI";
+
 	d = msi_create_irq_domain(fn, domain_info, parent);
 	if (!d) {
 		irq_domain_free_fwnode(fn);

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] iommu/vt-d: Implement select() method on remapping irqdomain
  2020-10-24 21:35                     ` [PATCH v3 25/35] iommu/vt-d: " David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for David Woodhouse
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for David Woodhouse @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: David Woodhouse, Thomas Gleixner, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     a87fb465ffe8eacd0d69032da33455e4f6fd8b41
Gitweb:        https://git.kernel.org/tip/a87fb465ffe8eacd0d69032da33455e4f6fd8b41
Author:        David Woodhouse <dwmw@amazon.co.uk>
AuthorDate:    Sat, 24 Oct 2020 22:35:25 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:27 +01:00

iommu/vt-d: Implement select() method on remapping irqdomain

Preparatory for removing irq_remapping_get_irq_domain()

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-26-dwmw2@infradead.org

---
 drivers/iommu/intel/irq_remapping.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 96c84b1..b3b079c 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1431,7 +1431,20 @@ static void intel_irq_remapping_deactivate(struct irq_domain *domain,
 	modify_irte(&data->irq_2_iommu, &entry);
 }
 
+static int intel_irq_remapping_select(struct irq_domain *d,
+				      struct irq_fwspec *fwspec,
+				      enum irq_domain_bus_token bus_token)
+{
+	if (x86_fwspec_is_ioapic(fwspec))
+		return d == map_ioapic_to_ir(fwspec->param[0]);
+	else if (x86_fwspec_is_hpet(fwspec))
+		return d == map_hpet_to_ir(fwspec->param[0]);
+
+	return 0;
+}
+
 static const struct irq_domain_ops intel_ir_domain_ops = {
+	.select = intel_irq_remapping_select,
 	.alloc = intel_irq_remapping_alloc,
 	.free = intel_irq_remapping_free,
 	.activate = intel_irq_remapping_activate,

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] iommu/amd: Implement select() method on remapping irqdomain
  2020-10-24 21:35                     ` [PATCH v3 24/35] iommu/amd: Implement select() method on remapping irqdomain David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for David Woodhouse
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for David Woodhouse @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: David Woodhouse, Thomas Gleixner, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     a1a785b572425ab3ca5494a4be02ab59a796df51
Gitweb:        https://git.kernel.org/tip/a1a785b572425ab3ca5494a4be02ab59a796df51
Author:        David Woodhouse <dwmw@amazon.co.uk>
AuthorDate:    Sat, 24 Oct 2020 22:35:24 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:27 +01:00

iommu/amd: Implement select() method on remapping irqdomain

Preparatory change to remove irq_remapping_get_irq_domain().

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-25-dwmw2@infradead.org

---
 drivers/iommu/amd/iommu.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 9744cdb..31b2224 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3882,7 +3882,26 @@ static void irq_remapping_deactivate(struct irq_domain *domain,
 					    irte_info->index);
 }
 
+static int irq_remapping_select(struct irq_domain *d, struct irq_fwspec *fwspec,
+				enum irq_domain_bus_token bus_token)
+{
+	struct amd_iommu *iommu;
+	int devid = -1;
+
+	if (x86_fwspec_is_ioapic(fwspec))
+		devid = get_ioapic_devid(fwspec->param[0]);
+	else if (x86_fwspec_is_hpet(fwspec))
+		devid = get_hpet_devid(fwspec->param[0]);
+
+	if (devid < 0)
+		return 0;
+
+	iommu = amd_iommu_rlookup_table[devid];
+	return iommu && iommu->ir_domain == d;
+}
+
 static const struct irq_domain_ops amd_ir_domain_ops = {
+	.select = irq_remapping_select,
 	.alloc = irq_remapping_alloc,
 	.free = irq_remapping_free,
 	.activate = irq_remapping_activate,

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] x86/apic: Add select() method on vector irqdomain
  2020-10-24 21:35                     ` [PATCH v3 23/35] x86/apic: Add select() method on vector irqdomain David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for David Woodhouse
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for David Woodhouse @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: David Woodhouse, Thomas Gleixner, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     6452ea2a323b80868ce5e6d3030e4ccbeab9dc30
Gitweb:        https://git.kernel.org/tip/6452ea2a323b80868ce5e6d3030e4ccbeab9dc30
Author:        David Woodhouse <dwmw@amazon.co.uk>
AuthorDate:    Sat, 24 Oct 2020 22:35:23 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:27 +01:00

x86/apic: Add select() method on vector irqdomain

This will be used to select the irqdomain for I/O-APIC and HPET.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-24-dwmw2@infradead.org

---
 arch/x86/include/asm/irqdomain.h |  3 ++-
 arch/x86/kernel/apic/vector.c    | 43 +++++++++++++++++++++++++++++++-
 2 files changed, 46 insertions(+)

diff --git a/arch/x86/include/asm/irqdomain.h b/arch/x86/include/asm/irqdomain.h
index cd684d4..125c23b 100644
--- a/arch/x86/include/asm/irqdomain.h
+++ b/arch/x86/include/asm/irqdomain.h
@@ -12,6 +12,9 @@ enum {
 	X86_IRQ_ALLOC_LEGACY				= 0x2,
 };
 
+extern int x86_fwspec_is_ioapic(struct irq_fwspec *fwspec);
+extern int x86_fwspec_is_hpet(struct irq_fwspec *fwspec);
+
 extern struct irq_domain *x86_vector_domain;
 
 extern void init_irq_alloc_info(struct irq_alloc_info *info,
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index bb2e2a2..b9b05ca 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -636,7 +636,50 @@ static void x86_vector_debug_show(struct seq_file *m, struct irq_domain *d,
 }
 #endif
 
+int x86_fwspec_is_ioapic(struct irq_fwspec *fwspec)
+{
+	if (fwspec->param_count != 1)
+		return 0;
+
+	if (is_fwnode_irqchip(fwspec->fwnode)) {
+		const char *fwname = fwnode_get_name(fwspec->fwnode);
+		return fwname && !strncmp(fwname, "IO-APIC-", 8) &&
+			simple_strtol(fwname+8, NULL, 10) == fwspec->param[0];
+	}
+	return to_of_node(fwspec->fwnode) &&
+		of_device_is_compatible(to_of_node(fwspec->fwnode),
+					"intel,ce4100-ioapic");
+}
+
+int x86_fwspec_is_hpet(struct irq_fwspec *fwspec)
+{
+	if (fwspec->param_count != 1)
+		return 0;
+
+	if (is_fwnode_irqchip(fwspec->fwnode)) {
+		const char *fwname = fwnode_get_name(fwspec->fwnode);
+		return fwname && !strncmp(fwname, "HPET-MSI-", 9) &&
+			simple_strtol(fwname+9, NULL, 10) == fwspec->param[0];
+	}
+	return 0;
+}
+
+static int x86_vector_select(struct irq_domain *d, struct irq_fwspec *fwspec,
+			     enum irq_domain_bus_token bus_token)
+{
+	/*
+	 * HPET and I/OAPIC cannot be parented in the vector domain
+	 * if IRQ remapping is enabled. APIC IDs above 15 bits are
+	 * only permitted if IRQ remapping is enabled, so check that.
+	 */
+	if (apic->apic_id_valid(32768))
+		return 0;
+
+	return x86_fwspec_is_ioapic(fwspec) || x86_fwspec_is_hpet(fwspec);
+}
+
 static const struct irq_domain_ops x86_vector_domain_ops = {
+	.select		= x86_vector_select,
 	.alloc		= x86_vector_alloc_irqs,
 	.free		= x86_vector_free_irqs,
 	.activate	= x86_vector_activate,

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] genirq/irqdomain: Implement get_name() method on irqchip fwnodes
  2020-10-24 21:35                     ` [PATCH v3 22/35] genirq/irqdomain: Implement get_name() method on irqchip fwnodes David Woodhouse
  2020-10-25  9:41                       ` Marc Zyngier
@ 2020-10-29 12:15                       ` tip-bot2 for David Woodhouse
  1 sibling, 0 replies; 192+ messages in thread
From: tip-bot2 for David Woodhouse @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: David Woodhouse, Thomas Gleixner, Marc Zyngier, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     2cbd5a45e5296b28d64224ffbbd33d427704ba1b
Gitweb:        https://git.kernel.org/tip/2cbd5a45e5296b28d64224ffbbd33d427704ba1b
Author:        David Woodhouse <dwmw@amazon.co.uk>
AuthorDate:    Sat, 24 Oct 2020 22:35:22 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:27 +01:00

genirq/irqdomain: Implement get_name() method on irqchip fwnodes

Prerequisite to make x86 more irqdomain compliant.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201024213535.443185-23-dwmw2@infradead.org

---
 kernel/irq/irqdomain.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index cf8b374..673fa64 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -42,7 +42,16 @@ static inline void debugfs_add_domain_dir(struct irq_domain *d) { }
 static inline void debugfs_remove_domain_dir(struct irq_domain *d) { }
 #endif
 
-const struct fwnode_operations irqchip_fwnode_ops;
+static const char *irqchip_fwnode_get_name(const struct fwnode_handle *fwnode)
+{
+	struct irqchip_fwid *fwid = container_of(fwnode, struct irqchip_fwid, fwnode);
+
+	return fwid->name;
+}
+
+const struct fwnode_operations irqchip_fwnode_ops = {
+	.get_name = irqchip_fwnode_get_name,
+};
 EXPORT_SYMBOL_GPL(irqchip_fwnode_ops);
 
 /**

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] x86/ioapic: Generate RTE directly from parent irqchip's MSI message
  2020-10-24 21:35                     ` [PATCH v3 21/35] x86/ioapic: Generate RTE directly from parent irqchip's MSI message David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for David Woodhouse
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for David Woodhouse @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: David Woodhouse, Thomas Gleixner, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     5d5a97133887b2dfd8e2ad0347c3a02cc7aaa0cb
Gitweb:        https://git.kernel.org/tip/5d5a97133887b2dfd8e2ad0347c3a02cc7aaa0cb
Author:        David Woodhouse <dwmw@amazon.co.uk>
AuthorDate:    Sat, 24 Oct 2020 22:35:21 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:27 +01:00

x86/ioapic: Generate RTE directly from parent irqchip's MSI message

The I/O-APIC generates an MSI cycle with address/data bits taken from its
Redirection Table Entry in some combination which used to make sense, but
now is just a bunch of bits which get passed through in some seemingly
arbitrary order.

Instead of making IRQ remapping drivers directly frob the I/OA-PIC RTE, let
them just do their job and generate an MSI message. The bit swizzling to
turn that MSI message into the I/O-APIC's RTE is the same in all cases,
since it's a function of the I/O-APIC hardware. The IRQ remappers have no
real need to get involved with that.

The only slight caveat is that the I/OAPIC is interpreting some of those
fields too, and it does want the 'vector' field to be unique to make EOI
work. The AMD IOMMU happens to put its IRTE index in the bits that the
I/O-APIC thinks are the vector field, and accommodates this requirement by
reserving the first 32 indices for the I/O-APIC.  The Intel IOMMU doesn't
actually use the bits that the I/O-APIC thinks are the vector field, so it
fills in the 'pin' value there instead.

[ tglx: Replaced the unreadably macro maze with the cleaned up RTE/msi_msg
  	bitfields and added commentry to explain the mapping magic ]

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-22-dwmw2@infradead.org

---
 arch/x86/include/asm/hw_irq.h       |  11 +--
 arch/x86/kernel/apic/io_apic.c      | 103 ++++++++++++++++++---------
 drivers/iommu/amd/iommu.c           |  12 +---
 drivers/iommu/hyperv-iommu.c        |  31 +--------
 drivers/iommu/intel/irq_remapping.c |  31 +-------
 5 files changed, 83 insertions(+), 105 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 517847a..83a69f6 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -45,12 +45,11 @@ enum irq_alloc_type {
 };
 
 struct ioapic_alloc_info {
-	int				pin;
-	int				node;
-	u32				is_level	: 1;
-	u32				active_low	: 1;
-	u32				valid		: 1;
-	struct IO_APIC_route_entry	*entry;
+	int		pin;
+	int		node;
+	u32		is_level	: 1;
+	u32		active_low	: 1;
+	u32		valid		: 1;
 };
 
 struct uv_alloc_info {
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 07e7541..ea983da 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -48,6 +48,7 @@
 #include <linux/jiffies.h>	/* time_after() */
 #include <linux/slab.h>
 #include <linux/memblock.h>
+#include <linux/msi.h>
 
 #include <asm/irqdomain.h>
 #include <asm/io.h>
@@ -63,7 +64,6 @@
 #include <asm/setup.h>
 #include <asm/irq_remapping.h>
 #include <asm/hw_irq.h>
-
 #include <asm/apic.h>
 
 #define	for_each_ioapic(idx)		\
@@ -1848,21 +1848,58 @@ static void ioapic_ir_ack_level(struct irq_data *irq_data)
 	eoi_ioapic_pin(data->entry.vector, data);
 }
 
+/*
+ * The I/OAPIC is just a device for generating MSI messages from legacy
+ * interrupt pins. Various fields of the RTE translate into bits of the
+ * resulting MSI which had a historical meaning.
+ *
+ * With interrupt remapping, many of those bits have different meanings
+ * in the underlying MSI, but the way that the I/OAPIC transforms them
+ * from its RTE to the MSI message is the same. This function allows
+ * the parent IRQ domain to compose the MSI message, then takes the
+ * relevant bits to put them in the appropriate places in the RTE in
+ * order to generate that message when the IRQ happens.
+ *
+ * The setup here relies on a preconfigured route entry (is_level,
+ * active_low, masked) because the parent domain is merely composing the
+ * generic message routing information which is used for the MSI.
+ */
+static void ioapic_setup_msg_from_msi(struct irq_data *irq_data,
+				      struct IO_APIC_route_entry *entry)
+{
+	struct msi_msg msg;
+
+	/* Let the parent domain compose the MSI message */
+	irq_chip_compose_msi_msg(irq_data, &msg);
+
+	/*
+	 * - Real vector
+	 * - DMAR/IR: 8bit subhandle (ioapic.pin)
+	 * - AMD/IR:  8bit IRTE index
+	 */
+	entry->vector			= msg.arch_data.vector;
+	/* Delivery mode (for DMAR/IR all 0) */
+	entry->delivery_mode		= msg.arch_data.delivery_mode;
+	/* Destination mode or DMAR/IR index bit 15 */
+	entry->dest_mode_logical	= msg.arch_addr_lo.dest_mode_logical;
+	/* DMAR/IR: 1, 0 for all other modes */
+	entry->ir_format		= msg.arch_addr_lo.dmar_format;
+	/*
+	 * DMAR/IR: index bit 0-14.
+	 *
+	 * All other modes have bit 0-6 of dmar_index_0_14 cleared and the
+	 * topmost 8 bits are destination id bit 0-7 (entry::destid_0_7).
+	 */
+	entry->ir_index_0_14		= msg.arch_addr_lo.dmar_index_0_14;
+}
+
 static void ioapic_configure_entry(struct irq_data *irqd)
 {
 	struct mp_chip_data *mpd = irqd->chip_data;
-	struct irq_cfg *cfg = irqd_cfg(irqd);
 	struct irq_pin_list *entry;
 
-	/*
-	 * Only update when the parent is the vector domain, don't touch it
-	 * if the parent is the remapping domain. Check the installed
-	 * ioapic chip to verify that.
-	 */
-	if (irqd->chip == &ioapic_chip) {
-		mpd->entry.destid_0_7 = cfg->dest_apicid;
-		mpd->entry.vector = cfg->vector;
-	}
+	ioapic_setup_msg_from_msi(irqd, &mpd->entry);
+
 	for_each_irq_pin(entry, mpd->irq_2_pin)
 		__ioapic_write_entry(entry->apic, entry->pin, mpd->entry);
 }
@@ -2919,14 +2956,23 @@ static void mp_irqdomain_get_attr(u32 gsi, struct mp_chip_data *data,
 	}
 }
 
-static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
-			   struct IO_APIC_route_entry *entry)
+/*
+ * Configure the I/O-APIC specific fields in the routing entry.
+ *
+ * This is important to setup the I/O-APIC specific bits (is_level,
+ * active_low, masked) because the underlying parent domain will only
+ * provide the routing information and is oblivious of the I/O-APIC
+ * specific bits.
+ *
+ * The entry is just preconfigured at this point and not written into the
+ * RTE. This happens later during activation which will fill in the actual
+ * routing information.
+ */
+static void mp_preconfigure_entry(struct mp_chip_data *data)
 {
+	struct IO_APIC_route_entry *entry = &data->entry;
+
 	memset(entry, 0, sizeof(*entry));
-	entry->delivery_mode	 = apic->delivery_mode;
-	entry->dest_mode_logical = apic->dest_mode_logical;
-	entry->destid_0_7	 = cfg->dest_apicid;
-	entry->vector		 = cfg->vector;
 	entry->is_level		 = data->is_level;
 	entry->active_low	 = data->active_low;
 	/*
@@ -2939,11 +2985,10 @@ static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
 int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
 		       unsigned int nr_irqs, void *arg)
 {
-	int ret, ioapic, pin;
-	struct irq_cfg *cfg;
-	struct irq_data *irq_data;
-	struct mp_chip_data *data;
 	struct irq_alloc_info *info = arg;
+	struct mp_chip_data *data;
+	struct irq_data *irq_data;
+	int ret, ioapic, pin;
 	unsigned long flags;
 
 	if (!info || nr_irqs > 1)
@@ -2961,7 +3006,6 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
 	if (!data)
 		return -ENOMEM;
 
-	info->ioapic.entry = &data->entry;
 	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, info);
 	if (ret < 0) {
 		kfree(data);
@@ -2975,23 +3019,20 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
 	irq_data->chip_data = data;
 	mp_irqdomain_get_attr(mp_pin_to_gsi(ioapic, pin), data, info);
 
-	cfg = irqd_cfg(irq_data);
 	add_pin_to_irq_node(data, ioapic_alloc_attr_node(info), ioapic, pin);
 
-	local_irq_save(flags);
-	if (info->ioapic.entry)
-		mp_setup_entry(cfg, data, info->ioapic.entry);
+	mp_preconfigure_entry(data);
 	mp_register_handler(virq, data->is_level);
+
+	local_irq_save(flags);
 	if (virq < nr_legacy_irqs())
 		legacy_pic->mask(virq);
 	local_irq_restore(flags);
 
 	apic_printk(APIC_VERBOSE, KERN_DEBUG
-		    "IOAPIC[%d]: Set routing entry (%d-%d -> 0x%x -> IRQ %d Mode:%i Active:%i Dest:%d)\n",
-		    ioapic, mpc_ioapic_id(ioapic), pin, cfg->vector,
-		    virq, data->is_level, data->active_low,
-		    cfg->dest_apicid);
-
+		    "IOAPIC[%d]: Preconfigured routing entry (%d-%d -> IRQ %d Level:%i ActiveLow:%i)\n",
+		    ioapic, mpc_ioapic_id(ioapic), pin, virq,
+		    data->is_level, data->active_low);
 	return 0;
 }
 
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 3d72ec7..9744cdb 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3669,7 +3669,6 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 				       int devid, int index, int sub_handle)
 {
 	struct irq_2_irte *irte_info = &data->irq_2_irte;
-	struct IO_APIC_route_entry *entry;
 	struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
 
 	if (!iommu)
@@ -3683,17 +3682,6 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC:
-		/* Setup IOAPIC entry */
-		entry = info->ioapic.entry;
-		info->ioapic.entry = NULL;
-		memset(entry, 0, sizeof(*entry));
-		entry->vector		= index;
-		entry->is_level		= info->ioapic.is_level;
-		entry->active_low	= info->ioapic.active_low;
-		/* Mask level triggered irqs. */
-		entry->masked		= info->ioapic.is_level;
-		break;
-
 	case X86_IRQ_ALLOC_TYPE_HPET:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 1ab7eb9..37dd485 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -40,7 +40,6 @@ static int hyperv_ir_set_affinity(struct irq_data *data,
 {
 	struct irq_data *parent = data->parent_data;
 	struct irq_cfg *cfg = irqd_cfg(data);
-	struct IO_APIC_route_entry *entry;
 	int ret;
 
 	/* Return error If new irq affinity is out of ioapic_max_cpumask. */
@@ -51,9 +50,6 @@ static int hyperv_ir_set_affinity(struct irq_data *data,
 	if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
 		return ret;
 
-	entry = data->chip_data;
-	entry->destid_0_7 = cfg->dest_apicid;
-	entry->vector = cfg->vector;
 	send_cleanup_vector(cfg);
 
 	return 0;
@@ -90,20 +86,6 @@ static int hyperv_irq_remapping_alloc(struct irq_domain *domain,
 	irq_data->chip = &hyperv_ir_chip;
 
 	/*
-	 * If there is interrupt remapping function of IOMMU, setting irq
-	 * affinity only needs to change IRTE of IOMMU. But Hyper-V doesn't
-	 * support interrupt remapping function, setting irq affinity of IO-APIC
-	 * interrupts still needs to change IO-APIC registers. But ioapic_
-	 * configure_entry() will ignore value of cfg->vector and cfg->
-	 * dest_apicid when IO-APIC's parent irq domain is not the vector
-	 * domain.(See ioapic_configure_entry()) In order to setting vector
-	 * and dest_apicid to IO-APIC register, IO-APIC entry pointer is saved
-	 * in the chip_data and hyperv_irq_remapping_activate()/hyperv_ir_set_
-	 * affinity() set vector and dest_apicid directly into IO-APIC entry.
-	 */
-	irq_data->chip_data = info->ioapic.entry;
-
-	/*
 	 * Hypver-V IO APIC irq affinity should be in the scope of
 	 * ioapic_max_cpumask because no irq remapping support.
 	 */
@@ -119,22 +101,9 @@ static void hyperv_irq_remapping_free(struct irq_domain *domain,
 	irq_domain_free_irqs_common(domain, virq, nr_irqs);
 }
 
-static int hyperv_irq_remapping_activate(struct irq_domain *domain,
-			  struct irq_data *irq_data, bool reserve)
-{
-	struct irq_cfg *cfg = irqd_cfg(irq_data);
-	struct IO_APIC_route_entry *entry = irq_data->chip_data;
-
-	entry->destid_0_7 = cfg->dest_apicid;
-	entry->vector = cfg->vector;
-
-	return 0;
-}
-
 static const struct irq_domain_ops hyperv_ir_domain_ops = {
 	.alloc = hyperv_irq_remapping_alloc,
 	.free = hyperv_irq_remapping_free,
-	.activate = hyperv_irq_remapping_activate,
 };
 
 static int __init hyperv_prepare_irq_remapping(void)
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 625bdb9..96c84b1 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1280,9 +1280,9 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 					     int index, int sub_handle)
 {
 	struct irte *irte = &data->irte_entry;
-	struct IO_APIC_route_entry *entry;
 
 	prepare_irte(irte, irq_cfg->vector, irq_cfg->dest_apicid);
+
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC:
 		/* Set source-id of interrupt request */
@@ -1293,39 +1293,20 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 			irte->trigger_mode, irte->dlvry_mode,
 			irte->avail, irte->vector, irte->dest_id,
 			irte->sid, irte->sq, irte->svt);
-
-		entry = info->ioapic.entry;
-		info->ioapic.entry = NULL;
-		memset(entry, 0, sizeof(*entry));
-		entry->ir_index_15	= !!(index & 0x8000);
-		entry->ir_format	= true;
-		entry->ir_index_0_14	= index & 0x7fff;
-		/*
-		 * IO-APIC RTE will be configured with virtual vector.
-		 * irq handler will do the explicit EOI to the io-apic.
-		 */
-		entry->vector		= info->ioapic.pin;
-		entry->is_level		= info->ioapic.is_level;
-		entry->active_low	= info->ioapic.active_low;
-		/* Mask level triggered irqs. */
-		entry->masked		= info->ioapic.is_level;
+		sub_handle = info->ioapic.pin;
 		break;
-
 	case X86_IRQ_ALLOC_TYPE_HPET:
+		set_hpet_sid(irte, info->devid);
+		break;
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-		if (info->type == X86_IRQ_ALLOC_TYPE_HPET)
-			set_hpet_sid(irte, info->devid);
-		else
-			set_msi_sid(irte, msi_desc_to_pci_dev(info->desc));
-
-		fill_msi_msg(&data->msi_entry, index, sub_handle);
+		set_msi_sid(irte, msi_desc_to_pci_dev(info->desc));
 		break;
-
 	default:
 		BUG_ON(1);
 		break;
 	}
+	fill_msi_msg(&data->msi_entry, index, sub_handle);
 }
 
 static void intel_free_irq_resources(struct irq_domain *domain,

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] x86/ioapic: Cleanup IO/APIC route entry structs
  2020-10-24 21:35                     ` [PATCH v3 20/35] x86/ioapic: Cleanup IO/APIC route entry structs David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, David Woodhouse, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     341b4a7211b6ba3a7089e1dc09ac4bd576dfb05f
Gitweb:        https://git.kernel.org/tip/341b4a7211b6ba3a7089e1dc09ac4bd576dfb05f
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Sat, 24 Oct 2020 22:35:20 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:27 +01:00

x86/ioapic: Cleanup IO/APIC route entry structs

Having two seperate structs for the I/O-APIC RTE entries (non-remapped and
DMAR remapped) requires type casts and makes it hard to map.

Combine them in IO_APIC_routing_entry by defining a union of two 64bit
bitfields. Use naming which reflects which bits are shared and which bits
are actually different for the operating modes.

[dwmw2: Fix it up and finish the job, pulling the 32-bit w1,w2 words for
        register access into the same union and eliminating a few more
        places where bits were accessed through masks and shifts.]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-21-dwmw2@infradead.org

---
 arch/x86/include/asm/io_apic.h      |  78 +++++----------
 arch/x86/kernel/apic/io_apic.c      | 144 ++++++++++++---------------
 drivers/iommu/amd/iommu.c           |   8 +-
 drivers/iommu/hyperv-iommu.c        |   4 +-
 drivers/iommu/intel/irq_remapping.c |  19 +---
 5 files changed, 108 insertions(+), 145 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index a1a26f6..73da644 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -13,15 +13,6 @@
  * Copyright (C) 1997, 1998, 1999, 2000 Ingo Molnar
  */
 
-/* I/O Unit Redirection Table */
-#define IO_APIC_REDIR_VECTOR_MASK	0x000FF
-#define IO_APIC_REDIR_DEST_LOGICAL	0x00800
-#define IO_APIC_REDIR_DEST_PHYSICAL	0x00000
-#define IO_APIC_REDIR_SEND_PENDING	(1 << 12)
-#define IO_APIC_REDIR_REMOTE_IRR	(1 << 14)
-#define IO_APIC_REDIR_LEVEL_TRIGGER	(1 << 15)
-#define IO_APIC_REDIR_MASKED		(1 << 16)
-
 /*
  * The structure of the IO-APIC:
  */
@@ -65,52 +56,39 @@ union IO_APIC_reg_03 {
 };
 
 struct IO_APIC_route_entry {
-	__u32	vector		:  8,
-		delivery_mode	:  3,	/* 000: FIXED
-					 * 001: lowest prio
-					 * 111: ExtINT
-					 */
-		dest_mode	:  1,	/* 0: physical, 1: logical */
-		delivery_status	:  1,
-		polarity	:  1,
-		irr		:  1,
-		trigger		:  1,	/* 0: edge, 1: level */
-		mask		:  1,	/* 0: enabled, 1: disabled */
-		__reserved_2	: 15;
-
-	__u32	__reserved_3	: 24,
-		dest		:  8;
-} __attribute__ ((packed));
-
-struct IR_IO_APIC_route_entry {
-	__u64	vector		: 8,
-		zero		: 3,
-		index2		: 1,
-		delivery_status : 1,
-		polarity	: 1,
-		irr		: 1,
-		trigger		: 1,
-		mask		: 1,
-		reserved	: 31,
-		format		: 1,
-		index		: 15;
+	union {
+		struct {
+			u64	vector			:  8,
+				delivery_mode		:  3,
+				dest_mode_logical	:  1,
+				delivery_status		:  1,
+				active_low		:  1,
+				irr			:  1,
+				is_level		:  1,
+				masked			:  1,
+				reserved_0		: 15,
+				reserved_1		: 24,
+				destid_0_7		:  8;
+		};
+		struct {
+			u64	ir_shared_0		:  8,
+				ir_zero			:  3,
+				ir_index_15		:  1,
+				ir_shared_1		:  5,
+				ir_reserved_0		: 31,
+				ir_format		:  1,
+				ir_index_0_14		: 15;
+		};
+		struct {
+			u64	w1			: 32,
+				w2			: 32;
+		};
+	};
 } __attribute__ ((packed));
 
 struct irq_alloc_info;
 struct ioapic_domain_cfg;
 
-#define IOAPIC_EDGE			0
-#define IOAPIC_LEVEL			1
-
-#define IOAPIC_MASKED			1
-#define IOAPIC_UNMASKED			0
-
-#define IOAPIC_POL_HIGH			0
-#define IOAPIC_POL_LOW			1
-
-#define IOAPIC_DEST_MODE_PHYSICAL	0
-#define IOAPIC_DEST_MODE_LOGICAL	1
-
 #define	IOAPIC_MAP_ALLOC		0x1
 #define	IOAPIC_MAP_CHECK		0x2
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 24a7bba..07e7541 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -286,31 +286,26 @@ static void io_apic_write(unsigned int apic, unsigned int reg,
 	writel(value, &io_apic->data);
 }
 
-union entry_union {
-	struct { u32 w1, w2; };
-	struct IO_APIC_route_entry entry;
-};
-
 static struct IO_APIC_route_entry __ioapic_read_entry(int apic, int pin)
 {
-	union entry_union eu;
+	struct IO_APIC_route_entry entry;
 
-	eu.w1 = io_apic_read(apic, 0x10 + 2 * pin);
-	eu.w2 = io_apic_read(apic, 0x11 + 2 * pin);
+	entry.w1 = io_apic_read(apic, 0x10 + 2 * pin);
+	entry.w2 = io_apic_read(apic, 0x11 + 2 * pin);
 
-	return eu.entry;
+	return entry;
 }
 
 static struct IO_APIC_route_entry ioapic_read_entry(int apic, int pin)
 {
-	union entry_union eu;
+	struct IO_APIC_route_entry entry;
 	unsigned long flags;
 
 	raw_spin_lock_irqsave(&ioapic_lock, flags);
-	eu.entry = __ioapic_read_entry(apic, pin);
+	entry = __ioapic_read_entry(apic, pin);
 	raw_spin_unlock_irqrestore(&ioapic_lock, flags);
 
-	return eu.entry;
+	return entry;
 }
 
 /*
@@ -321,11 +316,8 @@ static struct IO_APIC_route_entry ioapic_read_entry(int apic, int pin)
  */
 static void __ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e)
 {
-	union entry_union eu = {{0, 0}};
-
-	eu.entry = e;
-	io_apic_write(apic, 0x11 + 2*pin, eu.w2);
-	io_apic_write(apic, 0x10 + 2*pin, eu.w1);
+	io_apic_write(apic, 0x11 + 2*pin, e.w2);
+	io_apic_write(apic, 0x10 + 2*pin, e.w1);
 }
 
 static void ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e)
@@ -344,12 +336,12 @@ static void ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e)
  */
 static void ioapic_mask_entry(int apic, int pin)
 {
+	struct IO_APIC_route_entry e = { .masked = true };
 	unsigned long flags;
-	union entry_union eu = { .entry.mask = IOAPIC_MASKED };
 
 	raw_spin_lock_irqsave(&ioapic_lock, flags);
-	io_apic_write(apic, 0x10 + 2*pin, eu.w1);
-	io_apic_write(apic, 0x11 + 2*pin, eu.w2);
+	io_apic_write(apic, 0x10 + 2*pin, e.w1);
+	io_apic_write(apic, 0x11 + 2*pin, e.w2);
 	raw_spin_unlock_irqrestore(&ioapic_lock, flags);
 }
 
@@ -422,20 +414,15 @@ static void __init replace_pin_at_irq_node(struct mp_chip_data *data, int node,
 	add_pin_to_irq_node(data, node, newapic, newpin);
 }
 
-static void io_apic_modify_irq(struct mp_chip_data *data,
-			       int mask_and, int mask_or,
+static void io_apic_modify_irq(struct mp_chip_data *data, bool masked,
 			       void (*final)(struct irq_pin_list *entry))
 {
-	union entry_union eu;
 	struct irq_pin_list *entry;
 
-	eu.entry = data->entry;
-	eu.w1 &= mask_and;
-	eu.w1 |= mask_or;
-	data->entry = eu.entry;
+	data->entry.masked = masked;
 
 	for_each_irq_pin(entry, data->irq_2_pin) {
-		io_apic_write(entry->apic, 0x10 + 2 * entry->pin, eu.w1);
+		io_apic_write(entry->apic, 0x10 + 2 * entry->pin, data->entry.w1);
 		if (final)
 			final(entry);
 	}
@@ -459,13 +446,13 @@ static void mask_ioapic_irq(struct irq_data *irq_data)
 	unsigned long flags;
 
 	raw_spin_lock_irqsave(&ioapic_lock, flags);
-	io_apic_modify_irq(data, ~0, IO_APIC_REDIR_MASKED, &io_apic_sync);
+	io_apic_modify_irq(data, true, &io_apic_sync);
 	raw_spin_unlock_irqrestore(&ioapic_lock, flags);
 }
 
 static void __unmask_ioapic(struct mp_chip_data *data)
 {
-	io_apic_modify_irq(data, ~IO_APIC_REDIR_MASKED, 0, NULL);
+	io_apic_modify_irq(data, false, NULL);
 }
 
 static void unmask_ioapic_irq(struct irq_data *irq_data)
@@ -506,8 +493,8 @@ static void __eoi_ioapic_pin(int apic, int pin, int vector)
 		/*
 		 * Mask the entry and change the trigger mode to edge.
 		 */
-		entry1.mask = IOAPIC_MASKED;
-		entry1.trigger = IOAPIC_EDGE;
+		entry1.masked = true;
+		entry1.is_level = false;
 
 		__ioapic_write_entry(apic, pin, entry1);
 
@@ -542,8 +529,8 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin)
 	 * Make sure the entry is masked and re-read the contents to check
 	 * if it is a level triggered pin and if the remote-IRR is set.
 	 */
-	if (entry.mask == IOAPIC_UNMASKED) {
-		entry.mask = IOAPIC_MASKED;
+	if (!entry.masked) {
+		entry.masked = true;
 		ioapic_write_entry(apic, pin, entry);
 		entry = ioapic_read_entry(apic, pin);
 	}
@@ -556,8 +543,8 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin)
 		 * doesn't clear the remote-IRR if the trigger mode is not
 		 * set to level.
 		 */
-		if (entry.trigger == IOAPIC_EDGE) {
-			entry.trigger = IOAPIC_LEVEL;
+		if (!entry.is_level) {
+			entry.is_level = true;
 			ioapic_write_entry(apic, pin, entry);
 		}
 		raw_spin_lock_irqsave(&ioapic_lock, flags);
@@ -659,8 +646,8 @@ void mask_ioapic_entries(void)
 			struct IO_APIC_route_entry entry;
 
 			entry = ioapics[apic].saved_registers[pin];
-			if (entry.mask == IOAPIC_UNMASKED) {
-				entry.mask = IOAPIC_MASKED;
+			if (!entry.masked) {
+				entry.masked = true;
 				ioapic_write_entry(apic, pin, entry);
 			}
 		}
@@ -947,8 +934,8 @@ static bool mp_check_pin_attr(int irq, struct irq_alloc_info *info)
 	if (irq < nr_legacy_irqs() && data->count == 1) {
 		if (info->ioapic.is_level != data->is_level)
 			mp_register_handler(irq, info->ioapic.is_level);
-		data->entry.trigger = data->is_level = info->ioapic.is_level;
-		data->entry.polarity = data->active_low = info->ioapic.active_low;
+		data->entry.is_level = data->is_level = info->ioapic.is_level;
+		data->entry.active_low = data->active_low = info->ioapic.active_low;
 	}
 
 	return data->is_level == info->ioapic.is_level &&
@@ -1231,10 +1218,9 @@ void ioapic_zap_locks(void)
 
 static void io_apic_print_entries(unsigned int apic, unsigned int nr_entries)
 {
-	int i;
-	char buf[256];
 	struct IO_APIC_route_entry entry;
-	struct IR_IO_APIC_route_entry *ir_entry = (void *)&entry;
+	char buf[256];
+	int i;
 
 	printk(KERN_DEBUG "IOAPIC %d:\n", apic);
 	for (i = 0; i <= nr_entries; i++) {
@@ -1242,20 +1228,20 @@ static void io_apic_print_entries(unsigned int apic, unsigned int nr_entries)
 		snprintf(buf, sizeof(buf),
 			 " pin%02x, %s, %s, %s, V(%02X), IRR(%1d), S(%1d)",
 			 i,
-			 entry.mask == IOAPIC_MASKED ? "disabled" : "enabled ",
-			 entry.trigger == IOAPIC_LEVEL ? "level" : "edge ",
-			 entry.polarity == IOAPIC_POL_LOW ? "low " : "high",
+			 entry.masked ? "disabled" : "enabled ",
+			 entry.is_level ? "level" : "edge ",
+			 entry.active_low ? "low " : "high",
 			 entry.vector, entry.irr, entry.delivery_status);
-		if (ir_entry->format)
+		if (entry.ir_format) {
 			printk(KERN_DEBUG "%s, remapped, I(%04X),  Z(%X)\n",
-			       buf, (ir_entry->index2 << 15) | ir_entry->index,
-			       ir_entry->zero);
-		else
-			printk(KERN_DEBUG "%s, %s, D(%02X), M(%1d)\n",
 			       buf,
-			       entry.dest_mode == IOAPIC_DEST_MODE_LOGICAL ?
-			       "logical " : "physical",
-			       entry.dest, entry.delivery_mode);
+			       (entry.ir_index_15 << 15) | entry.ir_index_0_14,
+				entry.ir_zero);
+		} else {
+			printk(KERN_DEBUG "%s, %s, D(%02X), M(%1d)\n", buf,
+			       entry.dest_mode_logical ? "logical " : "physical",
+			       entry.destid_0_7, entry.delivery_mode);
+		}
 	}
 }
 
@@ -1380,8 +1366,8 @@ void __init enable_IO_APIC(void)
 		/* If the interrupt line is enabled and in ExtInt mode
 		 * I have found the pin where the i8259 is connected.
 		 */
-		if ((entry.mask == 0) &&
-		    (entry.delivery_mode == APIC_DELIVERY_MODE_EXTINT)) {
+		if (!entry.masked &&
+		    entry.delivery_mode == APIC_DELIVERY_MODE_EXTINT) {
 			ioapic_i8259.apic = apic;
 			ioapic_i8259.pin  = pin;
 			goto found_i8259;
@@ -1425,12 +1411,12 @@ void native_restore_boot_irq_mode(void)
 		struct IO_APIC_route_entry entry;
 
 		memset(&entry, 0, sizeof(entry));
-		entry.mask		= IOAPIC_UNMASKED;
-		entry.trigger		= IOAPIC_EDGE;
-		entry.polarity		= IOAPIC_POL_HIGH;
-		entry.dest_mode		= IOAPIC_DEST_MODE_PHYSICAL;
+		entry.masked		= false;
+		entry.is_level		= false;
+		entry.active_low	= false;
+		entry.dest_mode_logical	= false;
 		entry.delivery_mode	= APIC_DELIVERY_MODE_EXTINT;
-		entry.dest		= read_apic_id();
+		entry.destid_0_7	= read_apic_id();
 
 		/*
 		 * Add it to the IO-APIC irq-routing table:
@@ -1709,13 +1695,13 @@ static bool io_apic_level_ack_pending(struct mp_chip_data *data)
 
 	raw_spin_lock_irqsave(&ioapic_lock, flags);
 	for_each_irq_pin(entry, data->irq_2_pin) {
-		unsigned int reg;
+		struct IO_APIC_route_entry e;
 		int pin;
 
 		pin = entry->pin;
-		reg = io_apic_read(entry->apic, 0x10 + pin*2);
+		e.w1 = io_apic_read(entry->apic, 0x10 + pin*2);
 		/* Is the remote IRR bit set? */
-		if (reg & IO_APIC_REDIR_REMOTE_IRR) {
+		if (e.irr) {
 			raw_spin_unlock_irqrestore(&ioapic_lock, flags);
 			return true;
 		}
@@ -1874,7 +1860,7 @@ static void ioapic_configure_entry(struct irq_data *irqd)
 	 * ioapic chip to verify that.
 	 */
 	if (irqd->chip == &ioapic_chip) {
-		mpd->entry.dest = cfg->dest_apicid;
+		mpd->entry.destid_0_7 = cfg->dest_apicid;
 		mpd->entry.vector = cfg->vector;
 	}
 	for_each_irq_pin(entry, mpd->irq_2_pin)
@@ -1932,7 +1918,7 @@ static int ioapic_irq_get_chip_state(struct irq_data *irqd,
 		 * irrelevant because the IO-APIC treats them as fire and
 		 * forget.
 		 */
-		if (rentry.irr && rentry.trigger) {
+		if (rentry.irr && rentry.is_level) {
 			*state = true;
 			break;
 		}
@@ -2057,12 +2043,12 @@ static inline void __init unlock_ExtINT_logic(void)
 
 	memset(&entry1, 0, sizeof(entry1));
 
-	entry1.dest_mode = IOAPIC_DEST_MODE_PHYSICAL;
-	entry1.mask = IOAPIC_UNMASKED;
-	entry1.dest = hard_smp_processor_id();
-	entry1.delivery_mode = APIC_DELIVERY_MODE_EXTINT;
-	entry1.polarity = entry0.polarity;
-	entry1.trigger = IOAPIC_EDGE;
+	entry1.dest_mode_logical	= true;
+	entry1.masked			= false;
+	entry1.destid_0_7		= hard_smp_processor_id();
+	entry1.delivery_mode		= APIC_DELIVERY_MODE_EXTINT;
+	entry1.active_low		= entry0.active_low;
+	entry1.is_level			= false;
 	entry1.vector = 0;
 
 	ioapic_write_entry(apic, pin, entry1);
@@ -2937,17 +2923,17 @@ static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
 			   struct IO_APIC_route_entry *entry)
 {
 	memset(entry, 0, sizeof(*entry));
-	entry->delivery_mode = apic->delivery_mode;
-	entry->dest_mode     = apic->dest_mode_logical;
-	entry->dest	     = cfg->dest_apicid;
-	entry->vector	     = cfg->vector;
-	entry->trigger	     = data->is_level;
-	entry->polarity	     = data->active_low;
+	entry->delivery_mode	 = apic->delivery_mode;
+	entry->dest_mode_logical = apic->dest_mode_logical;
+	entry->destid_0_7	 = cfg->dest_apicid;
+	entry->vector		 = cfg->vector;
+	entry->is_level		 = data->is_level;
+	entry->active_low	 = data->active_low;
 	/*
 	 * Mask level triggered irqs. Edge triggered irqs are masked
 	 * by the irq core code in case they fire.
 	 */
-	entry->mask = data->is_level;
+	entry->masked		= data->is_level;
 }
 
 int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index b0e5210..3d72ec7 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3687,11 +3687,11 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 		entry = info->ioapic.entry;
 		info->ioapic.entry = NULL;
 		memset(entry, 0, sizeof(*entry));
-		entry->vector	= index;
-		entry->trigger	= info->ioapic.is_level;
-		entry->polarity	= info->ioapic.active_low;
+		entry->vector		= index;
+		entry->is_level		= info->ioapic.is_level;
+		entry->active_low	= info->ioapic.active_low;
 		/* Mask level triggered irqs. */
-		entry->mask	= info->ioapic.is_level;
+		entry->masked		= info->ioapic.is_level;
 		break;
 
 	case X86_IRQ_ALLOC_TYPE_HPET:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e09e2d7..1ab7eb9 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -52,7 +52,7 @@ static int hyperv_ir_set_affinity(struct irq_data *data,
 		return ret;
 
 	entry = data->chip_data;
-	entry->dest = cfg->dest_apicid;
+	entry->destid_0_7 = cfg->dest_apicid;
 	entry->vector = cfg->vector;
 	send_cleanup_vector(cfg);
 
@@ -125,7 +125,7 @@ static int hyperv_irq_remapping_activate(struct irq_domain *domain,
 	struct irq_cfg *cfg = irqd_cfg(irq_data);
 	struct IO_APIC_route_entry *entry = irq_data->chip_data;
 
-	entry->dest = cfg->dest_apicid;
+	entry->destid_0_7 = cfg->dest_apicid;
 	entry->vector = cfg->vector;
 
 	return 0;
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 54ca693..625bdb9 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1279,8 +1279,8 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 					     struct irq_alloc_info *info,
 					     int index, int sub_handle)
 {
-	struct IR_IO_APIC_route_entry *entry;
 	struct irte *irte = &data->irte_entry;
+	struct IO_APIC_route_entry *entry;
 
 	prepare_irte(irte, irq_cfg->vector, irq_cfg->dest_apicid);
 	switch (info->type) {
@@ -1294,22 +1294,21 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 			irte->avail, irte->vector, irte->dest_id,
 			irte->sid, irte->sq, irte->svt);
 
-		entry = (struct IR_IO_APIC_route_entry *)info->ioapic.entry;
+		entry = info->ioapic.entry;
 		info->ioapic.entry = NULL;
 		memset(entry, 0, sizeof(*entry));
-		entry->index2	= (index >> 15) & 0x1;
-		entry->zero	= 0;
-		entry->format	= 1;
-		entry->index	= (index & 0x7fff);
+		entry->ir_index_15	= !!(index & 0x8000);
+		entry->ir_format	= true;
+		entry->ir_index_0_14	= index & 0x7fff;
 		/*
 		 * IO-APIC RTE will be configured with virtual vector.
 		 * irq handler will do the explicit EOI to the io-apic.
 		 */
-		entry->vector	= info->ioapic.pin;
-		entry->trigger	= info->ioapic.is_level;
-		entry->polarity	= info->ioapic.active_low;
+		entry->vector		= info->ioapic.pin;
+		entry->is_level		= info->ioapic.is_level;
+		entry->active_low	= info->ioapic.active_low;
 		/* Mask level triggered irqs. */
-		entry->mask	= info->ioapic.is_level;
+		entry->masked		= info->ioapic.is_level;
 		break;
 
 	case X86_IRQ_ALLOC_TYPE_HPET:

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-10-24 21:35                     ` [PATCH v3 19/35] x86/io_apic: Cleanup trigger/polarity helpers David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for Thomas Gleixner
  2020-11-09 23:15                         ` Tom Lendacky
  2020-11-10  6:31                       ` [PATCH v3 19/35] x86/io_apic: Cleanup trigger/polarity helpers Qian Cai
  1 sibling, 1 reply; 192+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, David Woodhouse, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     a27dca645d2c0f31abb7858aa0e10b2fa0f2f659
Gitweb:        https://git.kernel.org/tip/a27dca645d2c0f31abb7858aa0e10b2fa0f2f659
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Sat, 24 Oct 2020 22:35:19 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:26 +01:00

x86/io_apic: Cleanup trigger/polarity helpers

'trigger' and 'polarity' are used throughout the I/O-APIC code for handling
the trigger type (edge/level) and the active low/high configuration. While
there are defines for initializing these variables and struct members, they
are not used consequently and the meaning of 'trigger' and 'polarity' is
opaque and confusing at best.

Rename them to 'is_level' and 'active_low' and make them boolean in various
structs so it's entirely clear what the meaning is.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-20-dwmw2@infradead.org

---
 arch/x86/include/asm/hw_irq.h       |   6 +-
 arch/x86/kernel/apic/io_apic.c      | 244 ++++++++++++---------------
 arch/x86/pci/intel_mid_pci.c        |   8 +-
 drivers/iommu/amd/iommu.c           |  10 +-
 drivers/iommu/intel/irq_remapping.c |   9 +-
 5 files changed, 130 insertions(+), 147 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index a4aeeaa..517847a 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -47,9 +47,9 @@ enum irq_alloc_type {
 struct ioapic_alloc_info {
 	int				pin;
 	int				node;
-	u32				trigger : 1;
-	u32				polarity : 1;
-	u32				valid : 1;
+	u32				is_level	: 1;
+	u32				active_low	: 1;
+	u32				valid		: 1;
 	struct IO_APIC_route_entry	*entry;
 };
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index c6d92d2..24a7bba 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -89,12 +89,12 @@ struct irq_pin_list {
 };
 
 struct mp_chip_data {
-	struct list_head irq_2_pin;
-	struct IO_APIC_route_entry entry;
-	int trigger;
-	int polarity;
+	struct list_head		irq_2_pin;
+	struct IO_APIC_route_entry	entry;
+	bool				is_level;
+	bool				active_low;
+	bool				isa_irq;
 	u32 count;
-	bool isa_irq;
 };
 
 struct mp_ioapic_gsi {
@@ -745,44 +745,7 @@ static int __init find_isa_irq_apic(int irq, int type)
 	return -1;
 }
 
-#ifdef CONFIG_EISA
-/*
- * EISA Edge/Level control register, ELCR
- */
-static int EISA_ELCR(unsigned int irq)
-{
-	if (irq < nr_legacy_irqs()) {
-		unsigned int port = 0x4d0 + (irq >> 3);
-		return (inb(port) >> (irq & 7)) & 1;
-	}
-	apic_printk(APIC_VERBOSE, KERN_INFO
-			"Broken MPtable reports ISA irq %d\n", irq);
-	return 0;
-}
-
-#endif
-
-/* ISA interrupts are always active high edge triggered,
- * when listed as conforming in the MP table. */
-
-#define default_ISA_trigger(idx)	(IOAPIC_EDGE)
-#define default_ISA_polarity(idx)	(IOAPIC_POL_HIGH)
-
-/* EISA interrupts are always polarity zero and can be edge or level
- * trigger depending on the ELCR value.  If an interrupt is listed as
- * EISA conforming in the MP table, that means its trigger type must
- * be read in from the ELCR */
-
-#define default_EISA_trigger(idx)	(EISA_ELCR(mp_irqs[idx].srcbusirq))
-#define default_EISA_polarity(idx)	default_ISA_polarity(idx)
-
-/* PCI interrupts are always active low level triggered,
- * when listed as conforming in the MP table. */
-
-#define default_PCI_trigger(idx)	(IOAPIC_LEVEL)
-#define default_PCI_polarity(idx)	(IOAPIC_POL_LOW)
-
-static int irq_polarity(int idx)
+static bool irq_active_low(int idx)
 {
 	int bus = mp_irqs[idx].srcbus;
 
@@ -791,90 +754,139 @@ static int irq_polarity(int idx)
 	 */
 	switch (mp_irqs[idx].irqflag & MP_IRQPOL_MASK) {
 	case MP_IRQPOL_DEFAULT:
-		/* conforms to spec, ie. bus-type dependent polarity */
-		if (test_bit(bus, mp_bus_not_pci))
-			return default_ISA_polarity(idx);
-		else
-			return default_PCI_polarity(idx);
+		/*
+		 * Conforms to spec, ie. bus-type dependent polarity.  PCI
+		 * defaults to low active. [E]ISA defaults to high active.
+		 */
+		return !test_bit(bus, mp_bus_not_pci);
 	case MP_IRQPOL_ACTIVE_HIGH:
-		return IOAPIC_POL_HIGH;
+		return false;
 	case MP_IRQPOL_RESERVED:
 		pr_warn("IOAPIC: Invalid polarity: 2, defaulting to low\n");
 		fallthrough;
 	case MP_IRQPOL_ACTIVE_LOW:
 	default: /* Pointless default required due to do gcc stupidity */
-		return IOAPIC_POL_LOW;
+		return true;
 	}
 }
 
 #ifdef CONFIG_EISA
-static int eisa_irq_trigger(int idx, int bus, int trigger)
+/*
+ * EISA Edge/Level control register, ELCR
+ */
+static bool EISA_ELCR(unsigned int irq)
+{
+	if (irq < nr_legacy_irqs()) {
+		unsigned int port = 0x4d0 + (irq >> 3);
+		return (inb(port) >> (irq & 7)) & 1;
+	}
+	apic_printk(APIC_VERBOSE, KERN_INFO
+			"Broken MPtable reports ISA irq %d\n", irq);
+	return false;
+}
+
+/*
+ * EISA interrupts are always active high and can be edge or level
+ * triggered depending on the ELCR value.  If an interrupt is listed as
+ * EISA conforming in the MP table, that means its trigger type must be
+ * read in from the ELCR.
+ */
+static bool eisa_irq_is_level(int idx, int bus, bool level)
 {
 	switch (mp_bus_id_to_type[bus]) {
 	case MP_BUS_PCI:
 	case MP_BUS_ISA:
-		return trigger;
+		return level;
 	case MP_BUS_EISA:
-		return default_EISA_trigger(idx);
+		return EISA_ELCR(mp_irqs[idx].srcbusirq);
 	}
 	pr_warn("IOAPIC: Invalid srcbus: %d defaulting to level\n", bus);
-	return IOAPIC_LEVEL;
+	return true;
 }
 #else
-static inline int eisa_irq_trigger(int idx, int bus, int trigger)
+static inline int eisa_irq_is_level(int idx, int bus, bool level)
 {
-	return trigger;
+	return level;
 }
 #endif
 
-static int irq_trigger(int idx)
+static bool irq_is_level(int idx)
 {
 	int bus = mp_irqs[idx].srcbus;
-	int trigger;
+	bool level;
 
 	/*
 	 * Determine IRQ trigger mode (edge or level sensitive):
 	 */
 	switch (mp_irqs[idx].irqflag & MP_IRQTRIG_MASK) {
 	case MP_IRQTRIG_DEFAULT:
-		/* conforms to spec, ie. bus-type dependent trigger mode */
-		if (test_bit(bus, mp_bus_not_pci))
-			trigger = default_ISA_trigger(idx);
-		else
-			trigger = default_PCI_trigger(idx);
+		/*
+		 * Conforms to spec, ie. bus-type dependent trigger
+		 * mode. PCI defaults to egde, ISA to level.
+		 */
+		level = test_bit(bus, mp_bus_not_pci);
 		/* Take EISA into account */
-		return eisa_irq_trigger(idx, bus, trigger);
+		return eisa_irq_is_level(idx, bus, level);
 	case MP_IRQTRIG_EDGE:
-		return IOAPIC_EDGE;
+		return false;
 	case MP_IRQTRIG_RESERVED:
 		pr_warn("IOAPIC: Invalid trigger mode 2 defaulting to level\n");
 		fallthrough;
 	case MP_IRQTRIG_LEVEL:
 	default: /* Pointless default required due to do gcc stupidity */
-		return IOAPIC_LEVEL;
+		return true;
 	}
 }
 
+static int __acpi_get_override_irq(u32 gsi, bool *trigger, bool *polarity)
+{
+	int ioapic, pin, idx;
+
+	if (skip_ioapic_setup)
+		return -1;
+
+	ioapic = mp_find_ioapic(gsi);
+	if (ioapic < 0)
+		return -1;
+
+	pin = mp_find_ioapic_pin(ioapic, gsi);
+	if (pin < 0)
+		return -1;
+
+	idx = find_irq_entry(ioapic, pin, mp_INT);
+	if (idx < 0)
+		return -1;
+
+	*trigger = irq_is_level(idx);
+	*polarity = irq_active_low(idx);
+	return 0;
+}
+
+#ifdef CONFIG_ACPI
+int acpi_get_override_irq(u32 gsi, int *is_level, int *active_low)
+{
+	*is_level = *active_low = 0;
+	return __acpi_get_override_irq(gsi, (bool *)is_level,
+				       (bool *)active_low);
+}
+#endif
+
 void ioapic_set_alloc_attr(struct irq_alloc_info *info, int node,
 			   int trigger, int polarity)
 {
 	init_irq_alloc_info(info, NULL);
 	info->type = X86_IRQ_ALLOC_TYPE_IOAPIC;
 	info->ioapic.node = node;
-	info->ioapic.trigger = trigger;
-	info->ioapic.polarity = polarity;
+	info->ioapic.is_level = trigger;
+	info->ioapic.active_low = polarity;
 	info->ioapic.valid = 1;
 }
 
-#ifndef CONFIG_ACPI
-int acpi_get_override_irq(u32 gsi, int *trigger, int *polarity);
-#endif
-
 static void ioapic_copy_alloc_attr(struct irq_alloc_info *dst,
 				   struct irq_alloc_info *src,
 				   u32 gsi, int ioapic_idx, int pin)
 {
-	int trigger, polarity;
+	bool level, pol_low;
 
 	copy_irq_alloc_info(dst, src);
 	dst->type = X86_IRQ_ALLOC_TYPE_IOAPIC;
@@ -883,20 +895,20 @@ static void ioapic_copy_alloc_attr(struct irq_alloc_info *dst,
 	dst->ioapic.valid = 1;
 	if (src && src->ioapic.valid) {
 		dst->ioapic.node = src->ioapic.node;
-		dst->ioapic.trigger = src->ioapic.trigger;
-		dst->ioapic.polarity = src->ioapic.polarity;
+		dst->ioapic.is_level = src->ioapic.is_level;
+		dst->ioapic.active_low = src->ioapic.active_low;
 	} else {
 		dst->ioapic.node = NUMA_NO_NODE;
-		if (acpi_get_override_irq(gsi, &trigger, &polarity) >= 0) {
-			dst->ioapic.trigger = trigger;
-			dst->ioapic.polarity = polarity;
+		if (__acpi_get_override_irq(gsi, &level, &pol_low) >= 0) {
+			dst->ioapic.is_level = level;
+			dst->ioapic.active_low = pol_low;
 		} else {
 			/*
 			 * PCI interrupts are always active low level
 			 * triggered.
 			 */
-			dst->ioapic.trigger = IOAPIC_LEVEL;
-			dst->ioapic.polarity = IOAPIC_POL_LOW;
+			dst->ioapic.is_level = true;
+			dst->ioapic.active_low = true;
 		}
 	}
 }
@@ -906,12 +918,12 @@ static int ioapic_alloc_attr_node(struct irq_alloc_info *info)
 	return (info && info->ioapic.valid) ? info->ioapic.node : NUMA_NO_NODE;
 }
 
-static void mp_register_handler(unsigned int irq, unsigned long trigger)
+static void mp_register_handler(unsigned int irq, bool level)
 {
 	irq_flow_handler_t hdl;
 	bool fasteoi;
 
-	if (trigger) {
+	if (level) {
 		irq_set_status_flags(irq, IRQ_LEVEL);
 		fasteoi = true;
 	} else {
@@ -933,14 +945,14 @@ static bool mp_check_pin_attr(int irq, struct irq_alloc_info *info)
 	 * pin with real trigger and polarity attributes.
 	 */
 	if (irq < nr_legacy_irqs() && data->count == 1) {
-		if (info->ioapic.trigger != data->trigger)
-			mp_register_handler(irq, info->ioapic.trigger);
-		data->entry.trigger = data->trigger = info->ioapic.trigger;
-		data->entry.polarity = data->polarity = info->ioapic.polarity;
+		if (info->ioapic.is_level != data->is_level)
+			mp_register_handler(irq, info->ioapic.is_level);
+		data->entry.trigger = data->is_level = info->ioapic.is_level;
+		data->entry.polarity = data->active_low = info->ioapic.active_low;
 	}
 
-	return data->trigger == info->ioapic.trigger &&
-	       data->polarity == info->ioapic.polarity;
+	return data->is_level == info->ioapic.is_level &&
+	       data->active_low == info->ioapic.active_low;
 }
 
 static int alloc_irq_from_domain(struct irq_domain *domain, int ioapic, u32 gsi,
@@ -2179,9 +2191,9 @@ static inline void __init check_timer(void)
 			 * so only need to unmask if it is level-trigger
 			 * do we really have level trigger timer?
 			 */
-			int idx;
-			idx = find_irq_entry(apic1, pin1, mp_INT);
-			if (idx != -1 && irq_trigger(idx))
+			int idx = find_irq_entry(apic1, pin1, mp_INT);
+
+			if (idx != -1 && irq_is_level(idx))
 				unmask_ioapic_irq(irq_get_irq_data(0));
 		}
 		irq_domain_deactivate_irq(irq_data);
@@ -2588,30 +2600,6 @@ static int io_apic_get_version(int ioapic)
 	return reg_01.bits.version;
 }
 
-int acpi_get_override_irq(u32 gsi, int *trigger, int *polarity)
-{
-	int ioapic, pin, idx;
-
-	if (skip_ioapic_setup)
-		return -1;
-
-	ioapic = mp_find_ioapic(gsi);
-	if (ioapic < 0)
-		return -1;
-
-	pin = mp_find_ioapic_pin(ioapic, gsi);
-	if (pin < 0)
-		return -1;
-
-	idx = find_irq_entry(ioapic, pin, mp_INT);
-	if (idx < 0)
-		return -1;
-
-	*trigger = irq_trigger(idx);
-	*polarity = irq_polarity(idx);
-	return 0;
-}
-
 /*
  * This function updates target affinity of IOAPIC interrupts to include
  * the CPUs which came online during SMP bringup.
@@ -2935,13 +2923,13 @@ static void mp_irqdomain_get_attr(u32 gsi, struct mp_chip_data *data,
 				  struct irq_alloc_info *info)
 {
 	if (info && info->ioapic.valid) {
-		data->trigger = info->ioapic.trigger;
-		data->polarity = info->ioapic.polarity;
-	} else if (acpi_get_override_irq(gsi, &data->trigger,
-					 &data->polarity) < 0) {
+		data->is_level = info->ioapic.is_level;
+		data->active_low = info->ioapic.active_low;
+	} else if (__acpi_get_override_irq(gsi, &data->is_level,
+					   &data->active_low) < 0) {
 		/* PCI interrupts are always active low level triggered. */
-		data->trigger = IOAPIC_LEVEL;
-		data->polarity = IOAPIC_POL_LOW;
+		data->is_level = true;
+		data->active_low = true;
 	}
 }
 
@@ -2953,16 +2941,13 @@ static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
 	entry->dest_mode     = apic->dest_mode_logical;
 	entry->dest	     = cfg->dest_apicid;
 	entry->vector	     = cfg->vector;
-	entry->trigger	     = data->trigger;
-	entry->polarity	     = data->polarity;
+	entry->trigger	     = data->is_level;
+	entry->polarity	     = data->active_low;
 	/*
 	 * Mask level triggered irqs. Edge triggered irqs are masked
 	 * by the irq core code in case they fire.
 	 */
-	if (data->trigger == IOAPIC_LEVEL)
-		entry->mask = IOAPIC_MASKED;
-	else
-		entry->mask = IOAPIC_UNMASKED;
+	entry->mask = data->is_level;
 }
 
 int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
@@ -3010,7 +2995,7 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
 	local_irq_save(flags);
 	if (info->ioapic.entry)
 		mp_setup_entry(cfg, data, info->ioapic.entry);
-	mp_register_handler(virq, data->trigger);
+	mp_register_handler(virq, data->is_level);
 	if (virq < nr_legacy_irqs())
 		legacy_pic->mask(virq);
 	local_irq_restore(flags);
@@ -3018,7 +3003,8 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
 	apic_printk(APIC_VERBOSE, KERN_DEBUG
 		    "IOAPIC[%d]: Set routing entry (%d-%d -> 0x%x -> IRQ %d Mode:%i Active:%i Dest:%d)\n",
 		    ioapic, mpc_ioapic_id(ioapic), pin, cfg->vector,
-		    virq, data->trigger, data->polarity, cfg->dest_apicid);
+		    virq, data->is_level, data->active_low,
+		    cfg->dest_apicid);
 
 	return 0;
 }
diff --git a/arch/x86/pci/intel_mid_pci.c b/arch/x86/pci/intel_mid_pci.c
index 24ca4ee..95e2e6b 100644
--- a/arch/x86/pci/intel_mid_pci.c
+++ b/arch/x86/pci/intel_mid_pci.c
@@ -215,7 +215,7 @@ static int pci_write(struct pci_bus *bus, unsigned int devfn, int where,
 static int intel_mid_pci_irq_enable(struct pci_dev *dev)
 {
 	struct irq_alloc_info info;
-	int polarity;
+	bool polarity_low;
 	int ret;
 	u8 gsi;
 
@@ -230,7 +230,7 @@ static int intel_mid_pci_irq_enable(struct pci_dev *dev)
 
 	switch (intel_mid_identify_cpu()) {
 	case INTEL_MID_CPU_CHIP_TANGIER:
-		polarity = IOAPIC_POL_HIGH;
+		polarity_low = false;
 
 		/* Special treatment for IRQ0 */
 		if (gsi == 0) {
@@ -252,11 +252,11 @@ static int intel_mid_pci_irq_enable(struct pci_dev *dev)
 		}
 		break;
 	default:
-		polarity = IOAPIC_POL_LOW;
+		polarity_low = true;
 		break;
 	}
 
-	ioapic_set_alloc_attr(&info, dev_to_node(&dev->dev), 1, polarity);
+	ioapic_set_alloc_attr(&info, dev_to_node(&dev->dev), 1, polarity_low);
 
 	/*
 	 * MRST only have IOAPIC, the PCI irq lines are 1:1 mapped to
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 473de09..b0e5210 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3687,13 +3687,11 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 		entry = info->ioapic.entry;
 		info->ioapic.entry = NULL;
 		memset(entry, 0, sizeof(*entry));
-		entry->vector        = index;
-		entry->mask          = 0;
-		entry->trigger       = info->ioapic.trigger;
-		entry->polarity      = info->ioapic.polarity;
+		entry->vector	= index;
+		entry->trigger	= info->ioapic.is_level;
+		entry->polarity	= info->ioapic.active_low;
 		/* Mask level triggered irqs. */
-		if (info->ioapic.trigger)
-			entry->mask = 1;
+		entry->mask	= info->ioapic.is_level;
 		break;
 
 	case X86_IRQ_ALLOC_TYPE_HPET:
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 30269b7..54ca693 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1306,11 +1306,10 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 		 * irq handler will do the explicit EOI to the io-apic.
 		 */
 		entry->vector	= info->ioapic.pin;
-		entry->mask	= 0;			/* enable IRQ */
-		entry->trigger	= info->ioapic.trigger;
-		entry->polarity	= info->ioapic.polarity;
-		if (info->ioapic.trigger)
-			entry->mask = 1; /* Mask level triggered irqs. */
+		entry->trigger	= info->ioapic.is_level;
+		entry->polarity	= info->ioapic.active_low;
+		/* Mask level triggered irqs. */
+		entry->mask	= info->ioapic.is_level;
 		break;
 
 	case X86_IRQ_ALLOC_TYPE_HPET:

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] x86/msi: Remove msidef.h
  2020-10-24 21:35                     ` [PATCH v3 18/35] x86/msi: Remove msidef.h David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, David Woodhouse, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     0c1883c1eb9dfa3c72af6e00425eeb1eb171a03e
Gitweb:        https://git.kernel.org/tip/0c1883c1eb9dfa3c72af6e00425eeb1eb171a03e
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Sat, 24 Oct 2020 22:35:18 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:26 +01:00

x86/msi: Remove msidef.h

Nothing uses the macro maze anymore.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-19-dwmw2@infradead.org

---
 arch/x86/include/asm/msidef.h | 57 +----------------------------------
 1 file changed, 57 deletions(-)
 delete mode 100644 arch/x86/include/asm/msidef.h

diff --git a/arch/x86/include/asm/msidef.h b/arch/x86/include/asm/msidef.h
deleted file mode 100644
index ee2f8cc..0000000
--- a/arch/x86/include/asm/msidef.h
+++ /dev/null
@@ -1,57 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_X86_MSIDEF_H
-#define _ASM_X86_MSIDEF_H
-
-/*
- * Constants for Intel APIC based MSI messages.
- */
-
-/*
- * Shifts for MSI data
- */
-
-#define MSI_DATA_VECTOR_SHIFT		0
-#define  MSI_DATA_VECTOR_MASK		0x000000ff
-#define	 MSI_DATA_VECTOR(v)		(((v) << MSI_DATA_VECTOR_SHIFT) & \
-					 MSI_DATA_VECTOR_MASK)
-
-#define MSI_DATA_DELIVERY_MODE_SHIFT	8
-#define  MSI_DATA_DELIVERY_FIXED	(0 << MSI_DATA_DELIVERY_MODE_SHIFT)
-#define  MSI_DATA_DELIVERY_LOWPRI	(1 << MSI_DATA_DELIVERY_MODE_SHIFT)
-
-#define MSI_DATA_LEVEL_SHIFT		14
-#define	 MSI_DATA_LEVEL_DEASSERT	(0 << MSI_DATA_LEVEL_SHIFT)
-#define	 MSI_DATA_LEVEL_ASSERT		(1 << MSI_DATA_LEVEL_SHIFT)
-
-#define MSI_DATA_TRIGGER_SHIFT		15
-#define  MSI_DATA_TRIGGER_EDGE		(0 << MSI_DATA_TRIGGER_SHIFT)
-#define  MSI_DATA_TRIGGER_LEVEL		(1 << MSI_DATA_TRIGGER_SHIFT)
-
-/*
- * Shift/mask fields for msi address
- */
-
-#define MSI_ADDR_BASE_HI		0
-#define MSI_ADDR_BASE_LO		0xfee00000
-
-#define MSI_ADDR_DEST_MODE_SHIFT	2
-#define  MSI_ADDR_DEST_MODE_PHYSICAL	(0 << MSI_ADDR_DEST_MODE_SHIFT)
-#define	 MSI_ADDR_DEST_MODE_LOGICAL	(1 << MSI_ADDR_DEST_MODE_SHIFT)
-
-#define MSI_ADDR_REDIRECTION_SHIFT	3
-#define  MSI_ADDR_REDIRECTION_CPU	(0 << MSI_ADDR_REDIRECTION_SHIFT)
-					/* dedicated cpu */
-#define  MSI_ADDR_REDIRECTION_LOWPRI	(1 << MSI_ADDR_REDIRECTION_SHIFT)
-					/* lowest priority */
-
-#define MSI_ADDR_DEST_ID_SHIFT		12
-#define	 MSI_ADDR_DEST_ID_MASK		0x00ffff0
-#define  MSI_ADDR_DEST_ID(dest)		(((dest) << MSI_ADDR_DEST_ID_SHIFT) & \
-					 MSI_ADDR_DEST_ID_MASK)
-#define MSI_ADDR_EXT_DEST_ID(dest)	((dest) & 0xffffff00)
-
-#define MSI_ADDR_IR_EXT_INT		(1 << 4)
-#define MSI_ADDR_IR_SHV			(1 << 3)
-#define MSI_ADDR_IR_INDEX1(index)	((index & 0x8000) >> 13)
-#define MSI_ADDR_IR_INDEX2(index)	((index & 0x7fff) << 5)
-#endif /* _ASM_X86_MSIDEF_H */

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] x86/kvm: Use msi_msg shadow structs
  2020-10-24 21:35                     ` [PATCH v3 16/35] x86/kvm: " David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, David Woodhouse, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     485940e0e691d6d7874fe1fe3b9453c5af41aace
Gitweb:        https://git.kernel.org/tip/485940e0e691d6d7874fe1fe3b9453c5af41aace
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Sat, 24 Oct 2020 22:35:16 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:26 +01:00

x86/kvm: Use msi_msg shadow structs

Use the bitfields in the x86 shadow structs instead of decomposing the
32bit value with macros.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-17-dwmw2@infradead.org

---
 arch/x86/kvm/irq_comm.c | 31 +++++++++++++------------------
 1 file changed, 13 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 4aa1c2e..8a4de3f 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -16,8 +16,6 @@
 
 #include <trace/events/kvm.h>
 
-#include <asm/msidef.h>
-
 #include "irq.h"
 
 #include "ioapic.h"
@@ -104,22 +102,19 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
 void kvm_set_msi_irq(struct kvm *kvm, struct kvm_kernel_irq_routing_entry *e,
 		     struct kvm_lapic_irq *irq)
 {
-	trace_kvm_msi_set_irq(e->msi.address_lo | (kvm->arch.x2apic_format ?
-	                                     (u64)e->msi.address_hi << 32 : 0),
-	                      e->msi.data);
-
-	irq->dest_id = (e->msi.address_lo &
-			MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
-	if (kvm->arch.x2apic_format)
-		irq->dest_id |= MSI_ADDR_EXT_DEST_ID(e->msi.address_hi);
-	irq->vector = (e->msi.data &
-			MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
-	irq->dest_mode = kvm_lapic_irq_dest_mode(
-	    !!((1 << MSI_ADDR_DEST_MODE_SHIFT) & e->msi.address_lo));
-	irq->trig_mode = (1 << MSI_DATA_TRIGGER_SHIFT) & e->msi.data;
-	irq->delivery_mode = e->msi.data & 0x700;
-	irq->msi_redir_hint = ((e->msi.address_lo
-		& MSI_ADDR_REDIRECTION_LOWPRI) > 0);
+	struct msi_msg msg = { .address_lo = e->msi.address_lo,
+			       .address_hi = e->msi.address_hi,
+			       .data = e->msi.data };
+
+	trace_kvm_msi_set_irq(msg.address_lo | (kvm->arch.x2apic_format ?
+			      (u64)msg.address_hi << 32 : 0), msg.data);
+
+	irq->dest_id = x86_msi_msg_get_destid(&msg, kvm->arch.x2apic_format);
+	irq->vector = msg.arch_data.vector;
+	irq->dest_mode = kvm_lapic_irq_dest_mode(msg.arch_addr_lo.dest_mode_logical);
+	irq->trig_mode = msg.arch_data.is_level;
+	irq->delivery_mode = msg.arch_data.delivery_mode << 8;
+	irq->msi_redir_hint = msg.arch_addr_lo.redirect_hint;
 	irq->level = 1;
 	irq->shorthand = APIC_DEST_NOSHORT;
 }

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] x86/pci/xen: Use msi_msg shadow structs
  2020-10-24 21:35                     ` [PATCH v3 17/35] x86/pci/xen: " David Woodhouse
  2020-10-25  9:49                       ` David Laight
@ 2020-10-29 12:15                       ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 192+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, David Woodhouse, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     41bb2115beec5e318095a89f5ad4a9c343cb21ad
Gitweb:        https://git.kernel.org/tip/41bb2115beec5e318095a89f5ad4a9c343cb21ad
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Sat, 24 Oct 2020 22:35:17 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:26 +01:00

x86/pci/xen: Use msi_msg shadow structs

Use the msi_msg shadow structs and compose the message with named bitfields
instead of the unreadable macro maze.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-18-dwmw2@infradead.org

---
 arch/x86/pci/xen.c | 26 +++++++++++---------------
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
index c552cd2..3d41a09 100644
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -152,7 +152,6 @@ static int acpi_register_gsi_xen(struct device *dev, u32 gsi,
 
 #if defined(CONFIG_PCI_MSI)
 #include <linux/msi.h>
-#include <asm/msidef.h>
 
 struct xen_pci_frontend_ops *xen_pci_frontend;
 EXPORT_SYMBOL_GPL(xen_pci_frontend);
@@ -210,23 +209,20 @@ free:
 	return ret;
 }
 
-#define XEN_PIRQ_MSI_DATA  (MSI_DATA_TRIGGER_EDGE | \
-		MSI_DATA_LEVEL_ASSERT | (3 << 8) | MSI_DATA_VECTOR(0))
-
 static void xen_msi_compose_msg(struct pci_dev *pdev, unsigned int pirq,
 		struct msi_msg *msg)
 {
-	/* We set vector == 0 to tell the hypervisor we don't care about it,
-	 * but we want a pirq setup instead.
-	 * We use the dest_id field to pass the pirq that we want. */
-	msg->address_hi = MSI_ADDR_BASE_HI | MSI_ADDR_EXT_DEST_ID(pirq);
-	msg->address_lo =
-		MSI_ADDR_BASE_LO |
-		MSI_ADDR_DEST_MODE_PHYSICAL |
-		MSI_ADDR_REDIRECTION_CPU |
-		MSI_ADDR_DEST_ID(pirq);
-
-	msg->data = XEN_PIRQ_MSI_DATA;
+	/*
+	 * We set vector == 0 to tell the hypervisor we don't care about
+	 * it, but we want a pirq setup instead.  We use the dest_id fields
+	 * to pass the pirq that we want.
+	 */
+	memset(msg, 0, sizeof(*msg));
+	msg->address_hi = X86_MSI_BASE_ADDRESS_HIGH;
+	msg->arch_addr_hi.destid_8_31 = pirq >> 8;
+	msg->arch_addr_lo.destid_0_7 = pirq & 0xFF;
+	msg->arch_addr_lo.base_address = X86_MSI_BASE_ADDRESS_LOW;
+	msg->arch_data.delivery_mode = APIC_DELIVERY_MODE_EXTINT;
 }
 
 static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] PCI: vmd: Use msi_msg shadow structs
  2020-10-24 21:35                     ` [PATCH v3 15/35] PCI: vmd: " David Woodhouse
  2020-10-28 20:49                       ` Kees Cook
@ 2020-10-29 12:15                       ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 192+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, David Woodhouse, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     e16c8058a10ba8e38d0d1ad0b64e444b245ffdbd
Gitweb:        https://git.kernel.org/tip/e16c8058a10ba8e38d0d1ad0b64e444b245ffdbd
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Sat, 24 Oct 2020 22:35:15 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:26 +01:00

PCI: vmd: Use msi_msg shadow structs

Use the x86 shadow structs in msi_msg instead of the macros.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-16-dwmw2@infradead.org

---
 drivers/pci/controller/vmd.c |  9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index f375c21..6f87954 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -18,7 +18,6 @@
 #include <asm/irqdomain.h>
 #include <asm/device.h>
 #include <asm/msi.h>
-#include <asm/msidef.h>
 
 #define VMD_CFGBAR	0
 #define VMD_MEMBAR1	2
@@ -131,10 +130,10 @@ static void vmd_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
 	struct vmd_irq_list *irq = vmdirq->irq;
 	struct vmd_dev *vmd = irq_data_get_irq_handler_data(data);
 
-	msg->address_hi = MSI_ADDR_BASE_HI;
-	msg->address_lo = MSI_ADDR_BASE_LO |
-			  MSI_ADDR_DEST_ID(index_from_irqs(vmd, irq));
-	msg->data = 0;
+	memset(msg, 0, sizeof(*msg));
+	msg->address_hi = X86_MSI_BASE_ADDRESS_HIGH;
+	msg->arch_addr_lo.base_address = X86_MSI_BASE_ADDRESS_LOW;
+	msg->arch_addr_lo.destid_0_7 = index_from_irqs(vmd, irq);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] iommu/amd: Use msi_msg shadow structs
  2020-10-24 21:35                     ` [PATCH v3 14/35] iommu/amd: " David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, David Woodhouse, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     b5c3786ee3704bd8cd5b29ae168526f2b1af4557
Gitweb:        https://git.kernel.org/tip/b5c3786ee3704bd8cd5b29ae168526f2b1af4557
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Sat, 24 Oct 2020 22:35:14 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:26 +01:00

iommu/amd: Use msi_msg shadow structs

Get rid of the macro mess and use the shadow structs for the x86 specific
MSI message format. Convert the intcapxt setup to use named bitfields as
well while touching it anyway.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-15-dwmw2@infradead.org

---
 drivers/iommu/amd/init.c  | 46 +++++++++++++++++++++-----------------
 drivers/iommu/amd/iommu.c | 14 +++++++-----
 2 files changed, 35 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 82e4af8..263670d 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -23,7 +23,6 @@
 #include <asm/pci-direct.h>
 #include <asm/iommu.h>
 #include <asm/apic.h>
-#include <asm/msidef.h>
 #include <asm/gart.h>
 #include <asm/x86_init.h>
 #include <asm/iommu_table.h>
@@ -1966,10 +1965,16 @@ static int iommu_setup_msi(struct amd_iommu *iommu)
 	return 0;
 }
 
-#define XT_INT_DEST_MODE(x)	(((x) & 0x1ULL) << 2)
-#define XT_INT_DEST_LO(x)	(((x) & 0xFFFFFFULL) << 8)
-#define XT_INT_VEC(x)		(((x) & 0xFFULL) << 32)
-#define XT_INT_DEST_HI(x)	((((x) >> 24) & 0xFFULL) << 56)
+union intcapxt {
+	u64	capxt;
+	u64	reserved_0		:  2,
+		dest_mode_logical	:  1,
+		reserved_1		:  5,
+		destid_0_23		: 24,
+		vector			:  8,
+		reserved_2		: 16,
+		destid_24_31		:  8;
+} __attribute__ ((packed));
 
 /*
  * Setup the IntCapXT registers with interrupt routing information
@@ -1978,28 +1983,29 @@ static int iommu_setup_msi(struct amd_iommu *iommu)
  */
 static void iommu_update_intcapxt(struct amd_iommu *iommu)
 {
-	u64 val;
-	u32 addr_lo = readl(iommu->mmio_base + MMIO_MSI_ADDR_LO_OFFSET);
-	u32 addr_hi = readl(iommu->mmio_base + MMIO_MSI_ADDR_HI_OFFSET);
-	u32 data    = readl(iommu->mmio_base + MMIO_MSI_DATA_OFFSET);
-	bool dm     = (addr_lo >> MSI_ADDR_DEST_MODE_SHIFT) & 0x1;
-	u32 dest    = ((addr_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xFF);
+	struct msi_msg msg;
+	union intcapxt xt;
+	u32 destid;
 
-	if (x2apic_enabled())
-		dest |= MSI_ADDR_EXT_DEST_ID(addr_hi);
+	msg.address_lo = readl(iommu->mmio_base + MMIO_MSI_ADDR_LO_OFFSET);
+	msg.address_hi = readl(iommu->mmio_base + MMIO_MSI_ADDR_HI_OFFSET);
+	msg.data = readl(iommu->mmio_base + MMIO_MSI_DATA_OFFSET);
 
-	val = XT_INT_VEC(data & 0xFF) |
-	      XT_INT_DEST_MODE(dm) |
-	      XT_INT_DEST_LO(dest) |
-	      XT_INT_DEST_HI(dest);
+	destid = x86_msi_msg_get_destid(&msg, x2apic_enabled());
+
+	xt.capxt = 0ULL;
+	xt.dest_mode_logical = msg.arch_data.dest_mode_logical;
+	xt.vector = msg.arch_data.vector;
+	xt.destid_0_23 = destid & GENMASK(23, 0);
+	xt.destid_24_31 = destid >> 24;
 
 	/**
 	 * Current IOMMU implemtation uses the same IRQ for all
 	 * 3 IOMMU interrupts.
 	 */
-	writeq(val, iommu->mmio_base + MMIO_INTCAPXT_EVT_OFFSET);
-	writeq(val, iommu->mmio_base + MMIO_INTCAPXT_PPR_OFFSET);
-	writeq(val, iommu->mmio_base + MMIO_INTCAPXT_GALOG_OFFSET);
+	writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_EVT_OFFSET);
+	writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_PPR_OFFSET);
+	writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_GALOG_OFFSET);
 }
 
 static void _irq_notifier_notify(struct irq_affinity_notify *notify,
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index d7f0c89..473de09 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -35,7 +35,6 @@
 #include <asm/io_apic.h>
 #include <asm/apic.h>
 #include <asm/hw_irq.h>
-#include <asm/msidef.h>
 #include <asm/proto.h>
 #include <asm/iommu.h>
 #include <asm/gart.h>
@@ -3656,13 +3655,20 @@ struct irq_remap_ops amd_iommu_irq_ops = {
 	.get_irq_domain		= get_irq_domain,
 };
 
+static void fill_msi_msg(struct msi_msg *msg, u32 index)
+{
+	msg->data = index;
+	msg->address_lo = 0;
+	msg->arch_addr_lo.base_address = X86_MSI_BASE_ADDRESS_LOW;
+	msg->address_hi = X86_MSI_BASE_ADDRESS_HIGH;
+}
+
 static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 				       struct irq_cfg *irq_cfg,
 				       struct irq_alloc_info *info,
 				       int devid, int index, int sub_handle)
 {
 	struct irq_2_irte *irte_info = &data->irq_2_irte;
-	struct msi_msg *msg = &data->msi_entry;
 	struct IO_APIC_route_entry *entry;
 	struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
 
@@ -3693,9 +3699,7 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 	case X86_IRQ_ALLOC_TYPE_HPET:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-		msg->address_hi = MSI_ADDR_BASE_HI;
-		msg->address_lo = MSI_ADDR_BASE_LO;
-		msg->data = irte_info->index;
+		fill_msi_msg(&data->msi_entry, irte_info->index);
 		break;
 
 	default:

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] x86/msi: Provide msi message shadow structs
  2020-10-24 21:35                     ` [PATCH v3 12/35] x86/msi: Provide msi message shadow structs David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for Thomas Gleixner
  2022-04-06  8:36                       ` [PATCH v3 12/35] " Reto Buerki
  1 sibling, 0 replies; 192+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, David Woodhouse, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     6285aa507366729c618d5295fb540b24a956088a
Gitweb:        https://git.kernel.org/tip/6285aa507366729c618d5295fb540b24a956088a
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Sat, 24 Oct 2020 22:35:12 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:25 +01:00

x86/msi: Provide msi message shadow structs

Create shadow structs with named bitfields for msi_msg data, address_lo and
address_hi and use them in the MSI message composer.

Provide a function to retrieve the destination ID. This could be inline,
but that'd create a circular header dependency.

[dwmw2: fix bitfields not all to be a union]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-13-dwmw2@infradead.org

---
 arch/x86/include/asm/msi.h  | 49 ++++++++++++++++++++++++++++++++++++-
 arch/x86/kernel/apic/apic.c | 35 ++++++++++++++------------
 2 files changed, 68 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/msi.h b/arch/x86/include/asm/msi.h
index cd30013..322fd90 100644
--- a/arch/x86/include/asm/msi.h
+++ b/arch/x86/include/asm/msi.h
@@ -9,4 +9,53 @@ typedef struct irq_alloc_info msi_alloc_info_t;
 int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
 		    msi_alloc_info_t *arg);
 
+/* Structs and defines for the X86 specific MSI message format */
+
+typedef struct x86_msi_data {
+	u32	vector			:  8,
+		delivery_mode		:  3,
+		dest_mode_logical	:  1,
+		reserved		:  2,
+		active_low		:  1,
+		is_level		:  1;
+
+	u32	dmar_subhandle;
+} __attribute__ ((packed)) arch_msi_msg_data_t;
+#define arch_msi_msg_data	x86_msi_data
+
+typedef struct x86_msi_addr_lo {
+	union {
+		struct {
+			u32	reserved_0		:  2,
+				dest_mode_logical	:  1,
+				redirect_hint		:  1,
+				reserved_1		:  8,
+				destid_0_7		:  8,
+				base_address		: 12;
+		};
+		struct {
+			u32	dmar_reserved_0		:  2,
+				dmar_index_15		:  1,
+				dmar_subhandle_valid	:  1,
+				dmar_format		:  1,
+				dmar_index_0_14		: 15,
+				dmar_base_address	: 12;
+		};
+	};
+} __attribute__ ((packed)) arch_msi_msg_addr_lo_t;
+#define arch_msi_msg_addr_lo	x86_msi_addr_lo
+
+#define X86_MSI_BASE_ADDRESS_LOW	(0xfee00000 >> 20)
+
+typedef struct x86_msi_addr_hi {
+	u32	reserved		:  8,
+		destid_8_31		: 24;
+} __attribute__ ((packed)) arch_msi_msg_addr_hi_t;
+#define arch_msi_msg_addr_hi	x86_msi_addr_hi
+
+#define X86_MSI_BASE_ADDRESS_HIGH	(0)
+
+struct msi_msg;
+u32 x86_msi_msg_get_destid(struct msi_msg *msg, bool extid);
+
 #endif /* _ASM_X86_MSI_H */
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 4c15bf2..f7196ee 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -50,7 +50,6 @@
 #include <asm/io_apic.h>
 #include <asm/desc.h>
 #include <asm/hpet.h>
-#include <asm/msidef.h>
 #include <asm/mtrr.h>
 #include <asm/time.h>
 #include <asm/smp.h>
@@ -2484,22 +2483,16 @@ int hard_smp_processor_id(void)
 void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
 			   bool dmar)
 {
-	msg->address_hi = MSI_ADDR_BASE_HI;
+	memset(msg, 0, sizeof(*msg));
 
-	msg->address_lo =
-		MSI_ADDR_BASE_LO |
-		(apic->dest_mode_logical ?
-			MSI_ADDR_DEST_MODE_LOGICAL :
-			MSI_ADDR_DEST_MODE_PHYSICAL) |
-		MSI_ADDR_REDIRECTION_CPU |
-		MSI_ADDR_DEST_ID(cfg->dest_apicid);
+	msg->arch_addr_lo.base_address = X86_MSI_BASE_ADDRESS_LOW;
+	msg->arch_addr_lo.dest_mode_logical = apic->dest_mode_logical;
+	msg->arch_addr_lo.destid_0_7 = cfg->dest_apicid & 0xFF;
 
-	msg->data =
-		MSI_DATA_TRIGGER_EDGE |
-		MSI_DATA_LEVEL_ASSERT |
-		MSI_DATA_DELIVERY_FIXED |
-		MSI_DATA_VECTOR(cfg->vector);
+	msg->arch_data.delivery_mode = APIC_DELIVERY_MODE_FIXED;
+	msg->arch_data.vector = cfg->vector;
 
+	msg->address_hi = X86_MSI_BASE_ADDRESS_HIGH;
 	/*
 	 * Only the IOMMU itself can use the trick of putting destination
 	 * APIC ID into the high bits of the address. Anything else would
@@ -2507,11 +2500,21 @@ void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
 	 * address higher APIC IDs.
 	 */
 	if (dmar)
-		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
+		msg->arch_addr_hi.destid_8_31 = cfg->dest_apicid >> 8;
 	else
-		WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid));
+		WARN_ON_ONCE(cfg->dest_apicid > 0xFF);
 }
 
+u32 x86_msi_msg_get_destid(struct msi_msg *msg, bool extid)
+{
+	u32 dest = msg->arch_addr_lo.destid_0_7;
+
+	if (extid)
+		dest |= msg->arch_addr_hi.destid_8_31 << 8;
+	return dest;
+}
+EXPORT_SYMBOL_GPL(x86_msi_msg_get_destid);
+
 /*
  * Override the generic EOI implementation with an optimized version.
  * Only called during early boot when only one CPU is active and with

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] iommu/intel: Use msi_msg shadow structs
  2020-10-24 21:35                     ` [PATCH v3 13/35] iommu/intel: Use msi_msg " David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, David Woodhouse, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     5c0d0e2cc6e0e7a96c25351fd67c775e7b1f11f0
Gitweb:        https://git.kernel.org/tip/5c0d0e2cc6e0e7a96c25351fd67c775e7b1f11f0
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Sat, 24 Oct 2020 22:35:13 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:25 +01:00

iommu/intel: Use msi_msg shadow structs

Use the bitfields in the x86 shadow struct to compose the MSI message.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-14-dwmw2@infradead.org

---
 drivers/iommu/intel/irq_remapping.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 5628d43..30269b7 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -20,7 +20,6 @@
 #include <asm/cpu.h>
 #include <asm/irq_remapping.h>
 #include <asm/pci-direct.h>
-#include <asm/msidef.h>
 
 #include "../irq_remapping.h"
 
@@ -1260,6 +1259,21 @@ static struct irq_chip intel_ir_chip = {
 	.irq_set_vcpu_affinity	= intel_ir_set_vcpu_affinity,
 };
 
+static void fill_msi_msg(struct msi_msg *msg, u32 index, u32 subhandle)
+{
+	memset(msg, 0, sizeof(*msg));
+
+	msg->arch_addr_lo.dmar_base_address = X86_MSI_BASE_ADDRESS_LOW;
+	msg->arch_addr_lo.dmar_subhandle_valid = true;
+	msg->arch_addr_lo.dmar_format = true;
+	msg->arch_addr_lo.dmar_index_0_14 = index & 0x7FFF;
+	msg->arch_addr_lo.dmar_index_15 = !!(index & 0x8000);
+
+	msg->address_hi = X86_MSI_BASE_ADDRESS_HIGH;
+
+	msg->arch_data.dmar_subhandle = subhandle;
+}
+
 static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 					     struct irq_cfg *irq_cfg,
 					     struct irq_alloc_info *info,
@@ -1267,7 +1281,6 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 {
 	struct IR_IO_APIC_route_entry *entry;
 	struct irte *irte = &data->irte_entry;
-	struct msi_msg *msg = &data->msi_entry;
 
 	prepare_irte(irte, irq_cfg->vector, irq_cfg->dest_apicid);
 	switch (info->type) {
@@ -1308,12 +1321,7 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 		else
 			set_msi_sid(irte, msi_desc_to_pci_dev(info->desc));
 
-		msg->address_hi = MSI_ADDR_BASE_HI;
-		msg->data = sub_handle;
-		msg->address_lo = MSI_ADDR_BASE_LO | MSI_ADDR_IR_EXT_INT |
-				  MSI_ADDR_IR_SHV |
-				  MSI_ADDR_IR_INDEX1(index) |
-				  MSI_ADDR_IR_INDEX2(index);
+		fill_msi_msg(&data->msi_entry, index, sub_handle);
 		break;
 
 	default:

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] x86/hpet: Move MSI support into hpet.c
  2020-10-24 21:35                     ` [PATCH v3 10/35] x86/hpet: Move MSI support into hpet.c David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for David Woodhouse
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for David Woodhouse @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: David Woodhouse, Thomas Gleixner, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     3d7295eb3003aea9f89de35304b3a88ae4d5036b
Gitweb:        https://git.kernel.org/tip/3d7295eb3003aea9f89de35304b3a88ae4d5036b
Author:        David Woodhouse <dwmw@amazon.co.uk>
AuthorDate:    Sat, 24 Oct 2020 22:35:10 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:25 +01:00

x86/hpet: Move MSI support into hpet.c

This isn't really dependent on PCI MSI; it's just generic MSI which is now
supported by the generic x86_vector_domain. Move the HPET MSI support back
into hpet.c with the rest of the HPET support.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-11-dwmw2@infradead.org

---
 arch/x86/include/asm/hpet.h |  11 +---
 arch/x86/kernel/apic/msi.c  | 111 +---------------------------------
 arch/x86/kernel/hpet.c      | 118 +++++++++++++++++++++++++++++++++--
 3 files changed, 112 insertions(+), 128 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 6352dee..ab9f3dd 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -74,17 +74,6 @@ extern void hpet_disable(void);
 extern unsigned int hpet_readl(unsigned int a);
 extern void force_hpet_resume(void);
 
-struct irq_data;
-struct hpet_channel;
-struct irq_domain;
-
-extern void hpet_msi_unmask(struct irq_data *data);
-extern void hpet_msi_mask(struct irq_data *data);
-extern void hpet_msi_write(struct hpet_channel *hc, struct msi_msg *msg);
-extern struct irq_domain *hpet_create_irq_domain(int hpet_id);
-extern int hpet_assign_irq(struct irq_domain *domain,
-			   struct hpet_channel *hc, int dev_num);
-
 #ifdef CONFIG_HPET_EMULATE_RTC
 
 #include <linux/interrupt.h>
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 4eda617..44ebe25 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -340,114 +340,3 @@ void dmar_free_hwirq(int irq)
 	irq_domain_free_irqs(irq, 1);
 }
 #endif
-
-/*
- * MSI message composition
- */
-#ifdef CONFIG_HPET_TIMER
-static inline int hpet_dev_id(struct irq_domain *domain)
-{
-	struct msi_domain_info *info = msi_get_domain_info(domain);
-
-	return (int)(long)info->data;
-}
-
-static void hpet_msi_write_msg(struct irq_data *data, struct msi_msg *msg)
-{
-	hpet_msi_write(irq_data_get_irq_handler_data(data), msg);
-}
-
-static struct irq_chip hpet_msi_controller __ro_after_init = {
-	.name = "HPET-MSI",
-	.irq_unmask = hpet_msi_unmask,
-	.irq_mask = hpet_msi_mask,
-	.irq_ack = irq_chip_ack_parent,
-	.irq_set_affinity = msi_domain_set_affinity,
-	.irq_retrigger = irq_chip_retrigger_hierarchy,
-	.irq_write_msi_msg = hpet_msi_write_msg,
-	.flags = IRQCHIP_SKIP_SET_WAKE,
-};
-
-static int hpet_msi_init(struct irq_domain *domain,
-			 struct msi_domain_info *info, unsigned int virq,
-			 irq_hw_number_t hwirq, msi_alloc_info_t *arg)
-{
-	irq_set_status_flags(virq, IRQ_MOVE_PCNTXT);
-	irq_domain_set_info(domain, virq, arg->hwirq, info->chip, NULL,
-			    handle_edge_irq, arg->data, "edge");
-
-	return 0;
-}
-
-static void hpet_msi_free(struct irq_domain *domain,
-			  struct msi_domain_info *info, unsigned int virq)
-{
-	irq_clear_status_flags(virq, IRQ_MOVE_PCNTXT);
-}
-
-static struct msi_domain_ops hpet_msi_domain_ops = {
-	.msi_init	= hpet_msi_init,
-	.msi_free	= hpet_msi_free,
-};
-
-static struct msi_domain_info hpet_msi_domain_info = {
-	.ops		= &hpet_msi_domain_ops,
-	.chip		= &hpet_msi_controller,
-	.flags		= MSI_FLAG_USE_DEF_DOM_OPS,
-};
-
-struct irq_domain *hpet_create_irq_domain(int hpet_id)
-{
-	struct msi_domain_info *domain_info;
-	struct irq_domain *parent, *d;
-	struct irq_alloc_info info;
-	struct fwnode_handle *fn;
-
-	if (x86_vector_domain == NULL)
-		return NULL;
-
-	domain_info = kzalloc(sizeof(*domain_info), GFP_KERNEL);
-	if (!domain_info)
-		return NULL;
-
-	*domain_info = hpet_msi_domain_info;
-	domain_info->data = (void *)(long)hpet_id;
-
-	init_irq_alloc_info(&info, NULL);
-	info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
-	info.devid = hpet_id;
-	parent = irq_remapping_get_irq_domain(&info);
-	if (parent == NULL)
-		parent = x86_vector_domain;
-	else
-		hpet_msi_controller.name = "IR-HPET-MSI";
-
-	fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
-					      hpet_id);
-	if (!fn) {
-		kfree(domain_info);
-		return NULL;
-	}
-
-	d = msi_create_irq_domain(fn, domain_info, parent);
-	if (!d) {
-		irq_domain_free_fwnode(fn);
-		kfree(domain_info);
-	}
-	return d;
-}
-
-int hpet_assign_irq(struct irq_domain *domain, struct hpet_channel *hc,
-		    int dev_num)
-{
-	struct irq_alloc_info info;
-
-	init_irq_alloc_info(&info, NULL);
-	info.type = X86_IRQ_ALLOC_TYPE_HPET;
-	info.data = hc;
-	info.devid = hpet_dev_id(domain);
-	info.hwirq = dev_num;
-
-	return irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, &info);
-}
-#endif
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 7a50f0b..3b8b127 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -7,6 +7,7 @@
 #include <linux/cpu.h>
 #include <linux/irq.h>
 
+#include <asm/irq_remapping.h>
 #include <asm/hpet.h>
 #include <asm/time.h>
 
@@ -50,7 +51,7 @@ unsigned long				hpet_address;
 u8					hpet_blockid; /* OS timer block num */
 bool					hpet_msi_disable;
 
-#ifdef CONFIG_PCI_MSI
+#ifdef CONFIG_GENERIC_MSI_IRQ
 static DEFINE_PER_CPU(struct hpet_channel *, cpu_hpet_channel);
 static struct irq_domain		*hpet_domain;
 #endif
@@ -467,9 +468,8 @@ static void __init hpet_legacy_clockevent_register(struct hpet_channel *hc)
 /*
  * HPET MSI Support
  */
-#ifdef CONFIG_PCI_MSI
-
-void hpet_msi_unmask(struct irq_data *data)
+#ifdef CONFIG_GENERIC_MSI_IRQ
+static void hpet_msi_unmask(struct irq_data *data)
 {
 	struct hpet_channel *hc = irq_data_get_irq_handler_data(data);
 	unsigned int cfg;
@@ -479,7 +479,7 @@ void hpet_msi_unmask(struct irq_data *data)
 	hpet_writel(cfg, HPET_Tn_CFG(hc->num));
 }
 
-void hpet_msi_mask(struct irq_data *data)
+static void hpet_msi_mask(struct irq_data *data)
 {
 	struct hpet_channel *hc = irq_data_get_irq_handler_data(data);
 	unsigned int cfg;
@@ -489,12 +489,118 @@ void hpet_msi_mask(struct irq_data *data)
 	hpet_writel(cfg, HPET_Tn_CFG(hc->num));
 }
 
-void hpet_msi_write(struct hpet_channel *hc, struct msi_msg *msg)
+static void hpet_msi_write(struct hpet_channel *hc, struct msi_msg *msg)
 {
 	hpet_writel(msg->data, HPET_Tn_ROUTE(hc->num));
 	hpet_writel(msg->address_lo, HPET_Tn_ROUTE(hc->num) + 4);
 }
 
+static void hpet_msi_write_msg(struct irq_data *data, struct msi_msg *msg)
+{
+	hpet_msi_write(irq_data_get_irq_handler_data(data), msg);
+}
+
+static struct irq_chip hpet_msi_controller __ro_after_init = {
+	.name = "HPET-MSI",
+	.irq_unmask = hpet_msi_unmask,
+	.irq_mask = hpet_msi_mask,
+	.irq_ack = irq_chip_ack_parent,
+	.irq_set_affinity = msi_domain_set_affinity,
+	.irq_retrigger = irq_chip_retrigger_hierarchy,
+	.irq_write_msi_msg = hpet_msi_write_msg,
+	.flags = IRQCHIP_SKIP_SET_WAKE,
+};
+
+static int hpet_msi_init(struct irq_domain *domain,
+			 struct msi_domain_info *info, unsigned int virq,
+			 irq_hw_number_t hwirq, msi_alloc_info_t *arg)
+{
+	irq_set_status_flags(virq, IRQ_MOVE_PCNTXT);
+	irq_domain_set_info(domain, virq, arg->hwirq, info->chip, NULL,
+			    handle_edge_irq, arg->data, "edge");
+
+	return 0;
+}
+
+static void hpet_msi_free(struct irq_domain *domain,
+			  struct msi_domain_info *info, unsigned int virq)
+{
+	irq_clear_status_flags(virq, IRQ_MOVE_PCNTXT);
+}
+
+static struct msi_domain_ops hpet_msi_domain_ops = {
+	.msi_init	= hpet_msi_init,
+	.msi_free	= hpet_msi_free,
+};
+
+static struct msi_domain_info hpet_msi_domain_info = {
+	.ops		= &hpet_msi_domain_ops,
+	.chip		= &hpet_msi_controller,
+	.flags		= MSI_FLAG_USE_DEF_DOM_OPS,
+};
+
+static struct irq_domain *hpet_create_irq_domain(int hpet_id)
+{
+	struct msi_domain_info *domain_info;
+	struct irq_domain *parent, *d;
+	struct irq_alloc_info info;
+	struct fwnode_handle *fn;
+
+	if (x86_vector_domain == NULL)
+		return NULL;
+
+	domain_info = kzalloc(sizeof(*domain_info), GFP_KERNEL);
+	if (!domain_info)
+		return NULL;
+
+	*domain_info = hpet_msi_domain_info;
+	domain_info->data = (void *)(long)hpet_id;
+
+	init_irq_alloc_info(&info, NULL);
+	info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
+	info.devid = hpet_id;
+	parent = irq_remapping_get_irq_domain(&info);
+	if (parent == NULL)
+		parent = x86_vector_domain;
+	else
+		hpet_msi_controller.name = "IR-HPET-MSI";
+
+	fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
+					      hpet_id);
+	if (!fn) {
+		kfree(domain_info);
+		return NULL;
+	}
+
+	d = msi_create_irq_domain(fn, domain_info, parent);
+	if (!d) {
+		irq_domain_free_fwnode(fn);
+		kfree(domain_info);
+	}
+	return d;
+}
+
+static inline int hpet_dev_id(struct irq_domain *domain)
+{
+	struct msi_domain_info *info = msi_get_domain_info(domain);
+
+	return (int)(long)info->data;
+}
+
+static int hpet_assign_irq(struct irq_domain *domain, struct hpet_channel *hc,
+			   int dev_num)
+{
+	struct irq_alloc_info info;
+
+	init_irq_alloc_info(&info, NULL);
+	info.type = X86_IRQ_ALLOC_TYPE_HPET;
+	info.data = hc;
+	info.devid = hpet_dev_id(domain);
+	info.hwirq = dev_num;
+
+	return irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, &info);
+}
+
 static int hpet_clkevt_msi_resume(struct clock_event_device *evt)
 {
 	struct hpet_channel *hc = clockevent_to_channel(evt);

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] genirq/msi: Allow shadow declarations of msi_msg:: $member
  2020-10-24 21:35                     ` [PATCH v3 11/35] genirq/msi: Allow shadow declarations of msi_msg::$member David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, David Woodhouse, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     8073c1ac82c12aaf1b475a3ce5328d43b3eaa4ae
Gitweb:        https://git.kernel.org/tip/8073c1ac82c12aaf1b475a3ce5328d43b3eaa4ae
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Sat, 24 Oct 2020 22:35:11 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:25 +01:00

genirq/msi: Allow shadow declarations of msi_msg:: $member

Architectures like x86 have their MSI messages in various bits of the data,
address_lo and address_hi field. Composing or decomposing these messages
with bitmasks and shifts is possible, but unreadable gunk.

Allow architectures to provide an architecture specific representation for
each member of msi_msg. Provide empty defaults for each and stick them into
an union.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-12-dwmw2@infradead.org

---
 include/asm-generic/msi.h |  4 +++-
 include/linux/msi.h       | 46 ++++++++++++++++++++++++++++++++++----
 2 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/include/asm-generic/msi.h b/include/asm-generic/msi.h
index e6795f0..25344de 100644
--- a/include/asm-generic/msi.h
+++ b/include/asm-generic/msi.h
@@ -4,6 +4,8 @@
 
 #include <linux/types.h>
 
+#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
+
 #ifndef NUM_MSI_ALLOC_SCRATCHPAD_REGS
 # define NUM_MSI_ALLOC_SCRATCHPAD_REGS	2
 #endif
@@ -30,4 +32,6 @@ typedef struct msi_alloc_info {
 
 #define GENERIC_MSI_DOMAIN_OPS		1
 
+#endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */
+
 #endif
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 6b584cc..360a0a7 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -4,11 +4,50 @@
 
 #include <linux/kobject.h>
 #include <linux/list.h>
+#include <asm/msi.h>
+
+/* Dummy shadow structures if an architecture does not define them */
+#ifndef arch_msi_msg_addr_lo
+typedef struct arch_msi_msg_addr_lo {
+	u32	address_lo;
+} __attribute__ ((packed)) arch_msi_msg_addr_lo_t;
+#endif
+
+#ifndef arch_msi_msg_addr_hi
+typedef struct arch_msi_msg_addr_hi {
+	u32	address_hi;
+} __attribute__ ((packed)) arch_msi_msg_addr_hi_t;
+#endif
+
+#ifndef arch_msi_msg_data
+typedef struct arch_msi_msg_data {
+	u32	data;
+} __attribute__ ((packed)) arch_msi_msg_data_t;
+#endif
 
+/**
+ * msi_msg - Representation of a MSI message
+ * @address_lo:		Low 32 bits of msi message address
+ * @arch_addrlo:	Architecture specific shadow of @address_lo
+ * @address_hi:		High 32 bits of msi message address
+ *			(only used when device supports it)
+ * @arch_addrhi:	Architecture specific shadow of @address_hi
+ * @data:		MSI message data (usually 16 bits)
+ * @arch_data:		Architecture specific shadow of @data
+ */
 struct msi_msg {
-	u32	address_lo;	/* low 32 bits of msi message address */
-	u32	address_hi;	/* high 32 bits of msi message address */
-	u32	data;		/* 16 bits of msi message data */
+	union {
+		u32			address_lo;
+		arch_msi_msg_addr_lo_t	arch_addr_lo;
+	};
+	union {
+		u32			address_hi;
+		arch_msi_msg_addr_hi_t	arch_addr_hi;
+	};
+	union {
+		u32			data;
+		arch_msi_msg_data_t	arch_data;
+	};
 };
 
 extern int pci_msi_ignore_mask;
@@ -243,7 +282,6 @@ struct msi_controller {
 #ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
 
 #include <linux/irqhandler.h>
-#include <asm/msi.h>
 
 struct irq_domain;
 struct irq_domain_ops;

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] x86/apic: Always provide irq_compose_msi_msg() method for vector domain
  2020-10-24 21:35                     ` [PATCH v3 09/35] x86/apic: Always provide irq_compose_msi_msg() method for vector domain David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for David Woodhouse
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for David Woodhouse @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: David Woodhouse, Thomas Gleixner, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     f598181acfb36f67e1de138cbe80a7db497f7d8c
Gitweb:        https://git.kernel.org/tip/f598181acfb36f67e1de138cbe80a7db497f7d8c
Author:        David Woodhouse <dwmw@amazon.co.uk>
AuthorDate:    Sat, 24 Oct 2020 22:35:09 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:25 +01:00

x86/apic: Always provide irq_compose_msi_msg() method for vector domain

This shouldn't be dependent on PCI_MSI. HPET and I/O-APIC can deliver
interrupts through MSI without having any PCI in the system at all.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-10-dwmw2@infradead.org

---
 arch/x86/include/asm/apic.h   |  8 ++-----
 arch/x86/kernel/apic/apic.c   | 32 +++++++++++++++++++++++++++++-
 arch/x86/kernel/apic/msi.c    | 37 +----------------------------------
 arch/x86/kernel/apic/vector.c |  6 ++++++-
 4 files changed, 41 insertions(+), 42 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index c1f64c6..34cb3c1 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -520,12 +520,10 @@ static inline void apic_smt_update(void) { }
 #endif
 
 struct msi_msg;
+struct irq_cfg;
 
-#ifdef CONFIG_PCI_MSI
-void x86_vector_msi_compose_msg(struct irq_data *data, struct msi_msg *msg);
-#else
-# define x86_vector_msi_compose_msg NULL
-#endif
+extern void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
+				  bool dmar);
 
 extern void ioapic_zap_locks(void);
 
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 54f0435..4c15bf2 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -50,6 +50,7 @@
 #include <asm/io_apic.h>
 #include <asm/desc.h>
 #include <asm/hpet.h>
+#include <asm/msidef.h>
 #include <asm/mtrr.h>
 #include <asm/time.h>
 #include <asm/smp.h>
@@ -2480,6 +2481,37 @@ int hard_smp_processor_id(void)
 	return read_apic_id();
 }
 
+void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
+			   bool dmar)
+{
+	msg->address_hi = MSI_ADDR_BASE_HI;
+
+	msg->address_lo =
+		MSI_ADDR_BASE_LO |
+		(apic->dest_mode_logical ?
+			MSI_ADDR_DEST_MODE_LOGICAL :
+			MSI_ADDR_DEST_MODE_PHYSICAL) |
+		MSI_ADDR_REDIRECTION_CPU |
+		MSI_ADDR_DEST_ID(cfg->dest_apicid);
+
+	msg->data =
+		MSI_DATA_TRIGGER_EDGE |
+		MSI_DATA_LEVEL_ASSERT |
+		MSI_DATA_DELIVERY_FIXED |
+		MSI_DATA_VECTOR(cfg->vector);
+
+	/*
+	 * Only the IOMMU itself can use the trick of putting destination
+	 * APIC ID into the high bits of the address. Anything else would
+	 * just be writing to memory if it tried that, and needs IR to
+	 * address higher APIC IDs.
+	 */
+	if (dmar)
+		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
+	else
+		WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid));
+}
+
 /*
  * Override the generic EOI implementation with an optimized version.
  * Only called during early boot when only one CPU is active and with
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 46ffd41..4eda617 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -15,7 +15,6 @@
 #include <linux/hpet.h>
 #include <linux/msi.h>
 #include <asm/irqdomain.h>
-#include <asm/msidef.h>
 #include <asm/hpet.h>
 #include <asm/hw_irq.h>
 #include <asm/apic.h>
@@ -23,42 +22,6 @@
 
 struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
 
-static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
-				  bool dmar)
-{
-	msg->address_hi = MSI_ADDR_BASE_HI;
-
-	msg->address_lo =
-		MSI_ADDR_BASE_LO |
-		(apic->dest_mode_logical ?
-			MSI_ADDR_DEST_MODE_LOGICAL :
-			MSI_ADDR_DEST_MODE_PHYSICAL) |
-		MSI_ADDR_REDIRECTION_CPU |
-		MSI_ADDR_DEST_ID(cfg->dest_apicid);
-
-	msg->data =
-		MSI_DATA_TRIGGER_EDGE |
-		MSI_DATA_LEVEL_ASSERT |
-		MSI_DATA_DELIVERY_FIXED |
-		MSI_DATA_VECTOR(cfg->vector);
-
-	/*
-	 * Only the IOMMU itself can use the trick of putting destination
-	 * APIC ID into the high bits of the address. Anything else would
-	 * just be writing to memory if it tried that, and needs IR to
-	 * address higher APIC IDs.
-	 */
-	if (dmar)
-		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
-	else
-		WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid));
-}
-
-void x86_vector_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
-{
-	__irq_msi_compose_msg(irqd_cfg(data), msg, false);
-}
-
 static void irq_msi_update_msg(struct irq_data *irqd, struct irq_cfg *cfg)
 {
 	struct msi_msg msg[2] = { [1] = { }, };
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 1eac536..bb2e2a2 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -818,6 +818,12 @@ void apic_ack_edge(struct irq_data *irqd)
 	apic_ack_irq(irqd);
 }
 
+static void x86_vector_msi_compose_msg(struct irq_data *data,
+				       struct msi_msg *msg)
+{
+       __irq_msi_compose_msg(irqd_cfg(data), msg, false);
+}
+
 static struct irq_chip lapic_controller = {
 	.name			= "APIC",
 	.irq_ack		= apic_ack_edge,

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] x86/apic: Get rid of apic:: Dest_logical
  2020-10-24 21:35                     ` [PATCH v3 07/35] x86/apic: Get rid of apic::dest_logical David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, David Woodhouse, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     e57d04e5fa00f7649d4c00796f8d12054799be4a
Gitweb:        https://git.kernel.org/tip/e57d04e5fa00f7649d4c00796f8d12054799be4a
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Sat, 24 Oct 2020 22:35:07 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:24 +01:00

x86/apic: Get rid of apic:: Dest_logical

struct apic has two members which store information about the destination
mode: dest_logical and irq_dest_mode.

dest_logical contains a mask which was historically used to set the
destination mode in IPI messages. Over time the usage was reduced and the
logical/physical functions were seperated.

There are only a few places which still use 'dest_logical' but they can
use 'irq_dest_mode' instead.

irq_dest_mode is actually a boolean where 0 means physical destination mode
and 1 means logical destination mode. Of course the name does not reflect
the functionality. This will be cleaned up in a subsequent change.

Remove apic::dest_logical and fixup the remaining users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-8-dwmw2@infradead.org

---
 arch/x86/include/asm/apic.h           | 2 --
 arch/x86/kernel/apic/apic.c           | 2 +-
 arch/x86/kernel/apic/apic_flat_64.c   | 8 ++------
 arch/x86/kernel/apic/apic_noop.c      | 4 +---
 arch/x86/kernel/apic/apic_numachip.c  | 8 ++------
 arch/x86/kernel/apic/bigsmp_32.c      | 4 +---
 arch/x86/kernel/apic/probe_32.c       | 4 +---
 arch/x86/kernel/apic/x2apic_cluster.c | 4 +---
 arch/x86/kernel/apic/x2apic_phys.c    | 4 +---
 arch/x86/kernel/apic/x2apic_uv_x.c    | 4 +---
 arch/x86/kernel/smpboot.c             | 5 +++--
 arch/x86/xen/apic.c                   | 4 +---
 12 files changed, 15 insertions(+), 38 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 37a08f3..e230ed2 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -306,8 +306,6 @@ struct apic {
 	void	(*send_IPI_all)(int vector);
 	void	(*send_IPI_self)(int vector);
 
-	/* dest_logical is used by the IPI functions */
-	u32	dest_logical;
 	u32	disable_esr;
 
 	enum apic_delivery_modes delivery_mode;
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 113f6ca..29d28b3 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1591,7 +1591,7 @@ static void setup_local_APIC(void)
 	apic->init_apic_ldr();
 
 #ifdef CONFIG_X86_32
-	if (apic->dest_logical) {
+	if (apic->irq_dest_mode == 1) {
 		int logical_apicid, ldr_apicid;
 
 		/*
diff --git a/arch/x86/kernel/apic/apic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c
index 6df837f..bbb1b89 100644
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -117,11 +117,9 @@ static struct apic apic_flat __ro_after_init = {
 	.irq_dest_mode			= 1, /* logical */
 
 	.disable_esr			= 0,
-	.dest_logical			= APIC_DEST_LOGICAL,
-	.check_apicid_used		= NULL,
 
+	.check_apicid_used		= NULL,
 	.init_apic_ldr			= flat_init_apic_ldr,
-
 	.ioapic_phys_id_map		= NULL,
 	.setup_apic_routing		= NULL,
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
@@ -210,11 +208,9 @@ static struct apic apic_physflat __ro_after_init = {
 	.irq_dest_mode			= 0, /* physical */
 
 	.disable_esr			= 0,
-	.dest_logical			= 0,
-	.check_apicid_used		= NULL,
 
+	.check_apicid_used		= NULL,
 	.init_apic_ldr			= physflat_init_apic_ldr,
-
 	.ioapic_phys_id_map		= NULL,
 	.setup_apic_routing		= NULL,
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
diff --git a/arch/x86/kernel/apic/apic_noop.c b/arch/x86/kernel/apic/apic_noop.c
index 4fc934b..38f167c 100644
--- a/arch/x86/kernel/apic/apic_noop.c
+++ b/arch/x86/kernel/apic/apic_noop.c
@@ -100,11 +100,9 @@ struct apic apic_noop __ro_after_init = {
 	.irq_dest_mode			= 1,
 
 	.disable_esr			= 0,
-	.dest_logical			= APIC_DEST_LOGICAL,
-	.check_apicid_used		= default_check_apicid_used,
 
+	.check_apicid_used		= default_check_apicid_used,
 	.init_apic_ldr			= noop_init_apic_ldr,
-
 	.ioapic_phys_id_map		= default_ioapic_phys_id_map,
 	.setup_apic_routing		= NULL,
 
diff --git a/arch/x86/kernel/apic/apic_numachip.c b/arch/x86/kernel/apic/apic_numachip.c
index db715d0..4ebf9fe 100644
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -250,11 +250,9 @@ static const struct apic apic_numachip1 __refconst = {
 	.irq_dest_mode			= 0, /* physical */
 
 	.disable_esr			= 0,
-	.dest_logical			= 0,
-	.check_apicid_used		= NULL,
 
+	.check_apicid_used		= NULL,
 	.init_apic_ldr			= flat_init_apic_ldr,
-
 	.ioapic_phys_id_map		= NULL,
 	.setup_apic_routing		= NULL,
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
@@ -299,11 +297,9 @@ static const struct apic apic_numachip2 __refconst = {
 	.irq_dest_mode			= 0, /* physical */
 
 	.disable_esr			= 0,
-	.dest_logical			= 0,
-	.check_apicid_used		= NULL,
 
+	.check_apicid_used		= NULL,
 	.init_apic_ldr			= flat_init_apic_ldr,
-
 	.ioapic_phys_id_map		= NULL,
 	.setup_apic_routing		= NULL,
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
diff --git a/arch/x86/kernel/apic/bigsmp_32.c b/arch/x86/kernel/apic/bigsmp_32.c
index 7f6461f..64c375b 100644
--- a/arch/x86/kernel/apic/bigsmp_32.c
+++ b/arch/x86/kernel/apic/bigsmp_32.c
@@ -132,11 +132,9 @@ static struct apic apic_bigsmp __ro_after_init = {
 	.irq_dest_mode			= 0,
 
 	.disable_esr			= 1,
-	.dest_logical			= 0,
-	.check_apicid_used		= bigsmp_check_apicid_used,
 
+	.check_apicid_used		= bigsmp_check_apicid_used,
 	.init_apic_ldr			= bigsmp_init_apic_ldr,
-
 	.ioapic_phys_id_map		= bigsmp_ioapic_phys_id_map,
 	.setup_apic_routing		= bigsmp_setup_apic_routing,
 	.cpu_present_to_apicid		= bigsmp_cpu_present_to_apicid,
diff --git a/arch/x86/kernel/apic/probe_32.c b/arch/x86/kernel/apic/probe_32.c
index 77c6e2e..97652aa 100644
--- a/arch/x86/kernel/apic/probe_32.c
+++ b/arch/x86/kernel/apic/probe_32.c
@@ -74,11 +74,9 @@ static struct apic apic_default __ro_after_init = {
 	.irq_dest_mode			= 1,
 
 	.disable_esr			= 0,
-	.dest_logical			= APIC_DEST_LOGICAL,
-	.check_apicid_used		= default_check_apicid_used,
 
+	.check_apicid_used		= default_check_apicid_used,
 	.init_apic_ldr			= default_init_apic_ldr,
-
 	.ioapic_phys_id_map		= default_ioapic_phys_id_map,
 	.setup_apic_routing		= setup_apic_flat_routing,
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
index f77e9fb..53390fc 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -188,11 +188,9 @@ static struct apic apic_x2apic_cluster __ro_after_init = {
 	.irq_dest_mode			= 1, /* logical */
 
 	.disable_esr			= 0,
-	.dest_logical			= APIC_DEST_LOGICAL,
-	.check_apicid_used		= NULL,
 
+	.check_apicid_used		= NULL,
 	.init_apic_ldr			= init_x2apic_ldr,
-
 	.ioapic_phys_id_map		= NULL,
 	.setup_apic_routing		= NULL,
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c
index 437e843..ee0c4d0 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -161,11 +161,9 @@ static struct apic apic_x2apic_phys __ro_after_init = {
 	.irq_dest_mode			= 0, /* physical */
 
 	.disable_esr			= 0,
-	.dest_logical			= 0,
-	.check_apicid_used		= NULL,
 
+	.check_apicid_used		= NULL,
 	.init_apic_ldr			= init_x2apic_ldr,
-
 	.ioapic_phys_id_map		= NULL,
 	.setup_apic_routing		= NULL,
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c
index 49deefd..d21a685 100644
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -811,11 +811,9 @@ static struct apic apic_x2apic_uv_x __ro_after_init = {
 	.irq_dest_mode			= 0, /* Physical */
 
 	.disable_esr			= 0,
-	.dest_logical			= APIC_DEST_PHYSICAL,
-	.check_apicid_used		= NULL,
 
+	.check_apicid_used		= NULL,
 	.init_apic_ldr			= uv_init_apic_ldr,
-
 	.ioapic_phys_id_map		= NULL,
 	.setup_apic_routing		= NULL,
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index de776b2..6c14f10 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -747,13 +747,14 @@ static void __init smp_quirk_init_udelay(void)
 int
 wakeup_secondary_cpu_via_nmi(int apicid, unsigned long start_eip)
 {
+	u32 dm = apic->irq_dest_mode ? APIC_DEST_LOGICAL : APIC_DEST_PHYSICAL;
 	unsigned long send_status, accept_status = 0;
 	int maxlvt;
 
 	/* Target chip */
 	/* Boot on the stack */
 	/* Kick the second */
-	apic_icr_write(APIC_DM_NMI | apic->dest_logical, apicid);
+	apic_icr_write(APIC_DM_NMI | dm, apicid);
 
 	pr_debug("Waiting for send to finish...\n");
 	send_status = safe_apic_wait_icr_idle();
@@ -980,7 +981,7 @@ wakeup_cpu_via_init_nmi(int cpu, unsigned long start_ip, int apicid,
 	if (!boot_error) {
 		enable_start_cpu0 = 1;
 		*cpu0_nmi_registered = 1;
-		if (apic->dest_logical == APIC_DEST_LOGICAL)
+		if (apic->irq_dest_mode)
 			id = cpu0_logical_apicid;
 		else
 			id = apicid;
diff --git a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c
index e82fd19..c35c24b 100644
--- a/arch/x86/xen/apic.c
+++ b/arch/x86/xen/apic.c
@@ -152,11 +152,9 @@ static struct apic xen_pv_apic = {
 	/* .irq_dest_mode     - used in native_compose_msi_msg only */
 
 	.disable_esr			= 0,
-	/* .dest_logical      -  default_send_IPI_ use it but we use our own. */
-	.check_apicid_used		= default_check_apicid_used, /* Used on 32-bit */
 
+	.check_apicid_used		= default_check_apicid_used, /* Used on 32-bit */
 	.init_apic_ldr			= xen_noop, /* setup_local_APIC calls it */
-
 	.ioapic_phys_id_map		= default_ioapic_phys_id_map, /* Used on 32-bit */
 	.setup_apic_routing		= NULL,
 	.cpu_present_to_apicid		= xen_cpu_present_to_apicid,

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] x86/apic: Cleanup destination mode
  2020-10-24 21:35                     ` [PATCH v3 08/35] x86/apic: Cleanup destination mode David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, David Woodhouse, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     8c44963b603db76e3e5f57d90d027657ba43c1fe
Gitweb:        https://git.kernel.org/tip/8c44963b603db76e3e5f57d90d027657ba43c1fe
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Sat, 24 Oct 2020 22:35:08 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:25 +01:00

x86/apic: Cleanup destination mode

apic::irq_dest_mode is actually a boolean, but defined as u32 and named in
a way which does not explain what it means.

Make it a boolean and rename it to 'dest_mode_logical'

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-9-dwmw2@infradead.org

---
 arch/x86/include/asm/apic.h           | 2 +-
 arch/x86/kernel/apic/apic.c           | 2 +-
 arch/x86/kernel/apic/apic_flat_64.c   | 4 ++--
 arch/x86/kernel/apic/apic_noop.c      | 4 +---
 arch/x86/kernel/apic/apic_numachip.c  | 4 ++--
 arch/x86/kernel/apic/bigsmp_32.c      | 3 +--
 arch/x86/kernel/apic/io_apic.c        | 2 +-
 arch/x86/kernel/apic/msi.c            | 6 +++---
 arch/x86/kernel/apic/probe_32.c       | 3 +--
 arch/x86/kernel/apic/x2apic_cluster.c | 2 +-
 arch/x86/kernel/apic/x2apic_phys.c    | 2 +-
 arch/x86/kernel/apic/x2apic_uv_x.c    | 2 +-
 arch/x86/kernel/smpboot.c             | 7 ++-----
 arch/x86/platform/uv/uv_irq.c         | 2 +-
 arch/x86/xen/apic.c                   | 3 +--
 drivers/iommu/amd/amd_iommu_types.h   | 2 +-
 drivers/iommu/amd/iommu.c             | 8 ++++----
 drivers/iommu/intel/irq_remapping.c   | 2 +-
 18 files changed, 26 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index e230ed2..c1f64c6 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -309,7 +309,7 @@ struct apic {
 	u32	disable_esr;
 
 	enum apic_delivery_modes delivery_mode;
-	u32	irq_dest_mode;
+	bool	dest_mode_logical;
 
 	u32	(*calc_dest_apicid)(unsigned int cpu);
 
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 29d28b3..54f0435 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1591,7 +1591,7 @@ static void setup_local_APIC(void)
 	apic->init_apic_ldr();
 
 #ifdef CONFIG_X86_32
-	if (apic->irq_dest_mode == 1) {
+	if (apic->dest_mode_logical) {
 		int logical_apicid, ldr_apicid;
 
 		/*
diff --git a/arch/x86/kernel/apic/apic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c
index bbb1b89..8f72b43 100644
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -114,7 +114,7 @@ static struct apic apic_flat __ro_after_init = {
 	.apic_id_registered		= flat_apic_id_registered,
 
 	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
-	.irq_dest_mode			= 1, /* logical */
+	.dest_mode_logical		= true,
 
 	.disable_esr			= 0,
 
@@ -205,7 +205,7 @@ static struct apic apic_physflat __ro_after_init = {
 	.apic_id_registered		= flat_apic_id_registered,
 
 	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
-	.irq_dest_mode			= 0, /* physical */
+	.dest_mode_logical		= false,
 
 	.disable_esr			= 0,
 
diff --git a/arch/x86/kernel/apic/apic_noop.c b/arch/x86/kernel/apic/apic_noop.c
index 38f167c..fe78319 100644
--- a/arch/x86/kernel/apic/apic_noop.c
+++ b/arch/x86/kernel/apic/apic_noop.c
@@ -96,8 +96,7 @@ struct apic apic_noop __ro_after_init = {
 	.apic_id_registered		= noop_apic_id_registered,
 
 	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
-	/* logical delivery broadcast to all CPUs: */
-	.irq_dest_mode			= 1,
+	.dest_mode_logical		= true,
 
 	.disable_esr			= 0,
 
@@ -105,7 +104,6 @@ struct apic apic_noop __ro_after_init = {
 	.init_apic_ldr			= noop_init_apic_ldr,
 	.ioapic_phys_id_map		= default_ioapic_phys_id_map,
 	.setup_apic_routing		= NULL,
-
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
 	.apicid_to_cpu_present		= physid_set_mask_of_physid,
 
diff --git a/arch/x86/kernel/apic/apic_numachip.c b/arch/x86/kernel/apic/apic_numachip.c
index 4ebf9fe..a54d817 100644
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -247,7 +247,7 @@ static const struct apic apic_numachip1 __refconst = {
 	.apic_id_registered		= numachip_apic_id_registered,
 
 	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
-	.irq_dest_mode			= 0, /* physical */
+	.dest_mode_logical		= false,
 
 	.disable_esr			= 0,
 
@@ -294,7 +294,7 @@ static const struct apic apic_numachip2 __refconst = {
 	.apic_id_registered		= numachip_apic_id_registered,
 
 	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
-	.irq_dest_mode			= 0, /* physical */
+	.dest_mode_logical		= false,
 
 	.disable_esr			= 0,
 
diff --git a/arch/x86/kernel/apic/bigsmp_32.c b/arch/x86/kernel/apic/bigsmp_32.c
index 64c375b..77555f6 100644
--- a/arch/x86/kernel/apic/bigsmp_32.c
+++ b/arch/x86/kernel/apic/bigsmp_32.c
@@ -128,8 +128,7 @@ static struct apic apic_bigsmp __ro_after_init = {
 	.apic_id_registered		= bigsmp_apic_id_registered,
 
 	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
-	/* phys delivery to target CPU: */
-	.irq_dest_mode			= 0,
+	.dest_mode_logical		= false,
 
 	.disable_esr			= 1,
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index cff6cbc..c6d92d2 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2950,7 +2950,7 @@ static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
 {
 	memset(entry, 0, sizeof(*entry));
 	entry->delivery_mode = apic->delivery_mode;
-	entry->dest_mode     = apic->irq_dest_mode;
+	entry->dest_mode     = apic->dest_mode_logical;
 	entry->dest	     = cfg->dest_apicid;
 	entry->vector	     = cfg->vector;
 	entry->trigger	     = data->trigger;
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 516df47..46ffd41 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -30,9 +30,9 @@ static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
 
 	msg->address_lo =
 		MSI_ADDR_BASE_LO |
-		((apic->irq_dest_mode == 0) ?
-			MSI_ADDR_DEST_MODE_PHYSICAL :
-			MSI_ADDR_DEST_MODE_LOGICAL) |
+		(apic->dest_mode_logical ?
+			MSI_ADDR_DEST_MODE_LOGICAL :
+			MSI_ADDR_DEST_MODE_PHYSICAL) |
 		MSI_ADDR_REDIRECTION_CPU |
 		MSI_ADDR_DEST_ID(cfg->dest_apicid);
 
diff --git a/arch/x86/kernel/apic/probe_32.c b/arch/x86/kernel/apic/probe_32.c
index 97652aa..a61f642 100644
--- a/arch/x86/kernel/apic/probe_32.c
+++ b/arch/x86/kernel/apic/probe_32.c
@@ -70,8 +70,7 @@ static struct apic apic_default __ro_after_init = {
 	.apic_id_registered		= default_apic_id_registered,
 
 	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
-	/* logical delivery broadcast to all CPUs: */
-	.irq_dest_mode			= 1,
+	.dest_mode_logical		= true,
 
 	.disable_esr			= 0,
 
diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
index 53390fc..df6adc5 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -185,7 +185,7 @@ static struct apic apic_x2apic_cluster __ro_after_init = {
 	.apic_id_registered		= x2apic_apic_id_registered,
 
 	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
-	.irq_dest_mode			= 1, /* logical */
+	.dest_mode_logical		= true,
 
 	.disable_esr			= 0,
 
diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c
index ee0c4d0..0e4e819 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -158,7 +158,7 @@ static struct apic apic_x2apic_phys __ro_after_init = {
 	.apic_id_registered		= x2apic_apic_id_registered,
 
 	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
-	.irq_dest_mode			= 0, /* physical */
+	.dest_mode_logical		= false,
 
 	.disable_esr			= 0,
 
diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c
index d21a685..de94181 100644
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -808,7 +808,7 @@ static struct apic apic_x2apic_uv_x __ro_after_init = {
 	.apic_id_registered		= uv_apic_id_registered,
 
 	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
-	.irq_dest_mode			= 0, /* Physical */
+	.dest_mode_logical		= false,
 
 	.disable_esr			= 0,
 
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 6c14f10..d133d65 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -747,7 +747,7 @@ static void __init smp_quirk_init_udelay(void)
 int
 wakeup_secondary_cpu_via_nmi(int apicid, unsigned long start_eip)
 {
-	u32 dm = apic->irq_dest_mode ? APIC_DEST_LOGICAL : APIC_DEST_PHYSICAL;
+	u32 dm = apic->dest_mode_logical ? APIC_DEST_LOGICAL : APIC_DEST_PHYSICAL;
 	unsigned long send_status, accept_status = 0;
 	int maxlvt;
 
@@ -981,10 +981,7 @@ wakeup_cpu_via_init_nmi(int cpu, unsigned long start_ip, int apicid,
 	if (!boot_error) {
 		enable_start_cpu0 = 1;
 		*cpu0_nmi_registered = 1;
-		if (apic->irq_dest_mode)
-			id = cpu0_logical_apicid;
-		else
-			id = apicid;
+		id = apic->dest_mode_logical ? cpu0_logical_apicid : apicid;
 		boot_error = wakeup_secondary_cpu_via_nmi(id, start_ip);
 	}
 
diff --git a/arch/x86/platform/uv/uv_irq.c b/arch/x86/platform/uv/uv_irq.c
index e7020d1..1a536a1 100644
--- a/arch/x86/platform/uv/uv_irq.c
+++ b/arch/x86/platform/uv/uv_irq.c
@@ -36,7 +36,7 @@ static void uv_program_mmr(struct irq_cfg *cfg, struct uv_irq_2_mmr_pnode *info)
 	entry = (struct uv_IO_APIC_route_entry *)&mmr_value;
 	entry->vector		= cfg->vector;
 	entry->delivery_mode	= apic->delivery_mode;
-	entry->dest_mode	= apic->irq_dest_mode;
+	entry->dest_mode	= apic->dest_mode_logical;
 	entry->polarity		= 0;
 	entry->trigger		= 0;
 	entry->mask		= 0;
diff --git a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c
index c35c24b..0d46cc2 100644
--- a/arch/x86/xen/apic.c
+++ b/arch/x86/xen/apic.c
@@ -148,8 +148,7 @@ static struct apic xen_pv_apic = {
 	.apic_id_valid 			= xen_id_always_valid,
 	.apic_id_registered 		= xen_id_always_registered,
 
-	/* .irq_delivery_mode - used in native_compose_msi_msg only */
-	/* .irq_dest_mode     - used in native_compose_msi_msg only */
+	/* .delivery_mode and .dest_mode_logical not used by XENPV */
 
 	.disable_esr			= 0,
 
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index f696ac7..ba74a72 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -893,7 +893,7 @@ struct amd_ir_data {
 };
 
 struct amd_irte_ops {
-	void (*prepare)(void *, u32, u32, u8, u32, int);
+	void (*prepare)(void *, u32, bool, u8, u32, int);
 	void (*activate)(void *, u16, u16);
 	void (*deactivate)(void *, u16, u16);
 	void (*set_affinity)(void *, u16, u16, u8, u32);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index bc81b91..d7f0c89 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3466,7 +3466,7 @@ static void free_irte(u16 devid, int index)
 }
 
 static void irte_prepare(void *entry,
-			 u32 delivery_mode, u32 dest_mode,
+			 u32 delivery_mode, bool dest_mode,
 			 u8 vector, u32 dest_apicid, int devid)
 {
 	union irte *irte = (union irte *) entry;
@@ -3480,7 +3480,7 @@ static void irte_prepare(void *entry,
 }
 
 static void irte_ga_prepare(void *entry,
-			    u32 delivery_mode, u32 dest_mode,
+			    u32 delivery_mode, bool dest_mode,
 			    u8 vector, u32 dest_apicid, int devid)
 {
 	struct irte_ga *irte = (struct irte_ga *) entry;
@@ -3672,7 +3672,7 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 	data->irq_2_irte.devid = devid;
 	data->irq_2_irte.index = index + sub_handle;
 	iommu->irte_ops->prepare(data->entry, apic->delivery_mode,
-				 apic->irq_dest_mode, irq_cfg->vector,
+				 apic->dest_mode_logical, irq_cfg->vector,
 				 irq_cfg->dest_apicid, devid);
 
 	switch (info->type) {
@@ -3943,7 +3943,7 @@ int amd_iommu_deactivate_guest_mode(void *data)
 	entry->hi.val = 0;
 
 	entry->lo.fields_remap.valid       = valid;
-	entry->lo.fields_remap.dm          = apic->irq_dest_mode;
+	entry->lo.fields_remap.dm          = apic->dest_mode_logical;
 	entry->lo.fields_remap.int_type    = apic->delivery_mode;
 	entry->hi.fields.vector            = cfg->vector;
 	entry->lo.fields_remap.destination =
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index d44e719..5628d43 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1113,7 +1113,7 @@ static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
 	memset(irte, 0, sizeof(*irte));
 
 	irte->present = 1;
-	irte->dst_mode = apic->irq_dest_mode;
+	irte->dst_mode = apic->dest_mode_logical;
 	/*
 	 * Trigger mode in the IRTE will always be edge, and for IO-APIC, the
 	 * actual level or edge trigger will be setup in the IO-APIC

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] x86/apic: Replace pointless apic:: Dest_logical usage
  2020-10-24 21:35                     ` [PATCH v3 06/35] x86/apic: Replace pointless apic::dest_logical usage David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, David Woodhouse, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     22e0db42097b30a49771744e514350b7e9dd26f2
Gitweb:        https://git.kernel.org/tip/22e0db42097b30a49771744e514350b7e9dd26f2
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Sat, 24 Oct 2020 22:35:06 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:24 +01:00

x86/apic: Replace pointless apic:: Dest_logical usage

All these functions are only used for logical destination mode. So reading
the destination mode mask from the apic structure is a pointless
exercise. Just hand in the proper constant: APIC_DEST_LOGICAL.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-7-dwmw2@infradead.org

---
 arch/x86/kernel/apic/apic_flat_64.c   | 2 +-
 arch/x86/kernel/apic/ipi.c            | 6 +++---
 arch/x86/kernel/apic/x2apic_cluster.c | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/apic/apic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c
index fdd38a1..6df837f 100644
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -53,7 +53,7 @@ static void _flat_send_IPI_mask(unsigned long mask, int vector)
 	unsigned long flags;
 
 	local_irq_save(flags);
-	__default_send_IPI_dest_field(mask, vector, apic->dest_logical);
+	__default_send_IPI_dest_field(mask, vector, APIC_DEST_LOGICAL);
 	local_irq_restore(flags);
 }
 
diff --git a/arch/x86/kernel/apic/ipi.c b/arch/x86/kernel/apic/ipi.c
index 387154e..d1fb874 100644
--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -260,7 +260,7 @@ void default_send_IPI_mask_sequence_logical(const struct cpumask *mask,
 	for_each_cpu(query_cpu, mask)
 		__default_send_IPI_dest_field(
 			early_per_cpu(x86_cpu_to_logical_apicid, query_cpu),
-			vector, apic->dest_logical);
+			vector, APIC_DEST_LOGICAL);
 	local_irq_restore(flags);
 }
 
@@ -279,7 +279,7 @@ void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask,
 			continue;
 		__default_send_IPI_dest_field(
 			early_per_cpu(x86_cpu_to_logical_apicid, query_cpu),
-			vector, apic->dest_logical);
+			vector, APIC_DEST_LOGICAL);
 		}
 	local_irq_restore(flags);
 }
@@ -297,7 +297,7 @@ void default_send_IPI_mask_logical(const struct cpumask *cpumask, int vector)
 
 	local_irq_save(flags);
 	WARN_ON(mask & ~cpumask_bits(cpu_online_mask)[0]);
-	__default_send_IPI_dest_field(mask, vector, apic->dest_logical);
+	__default_send_IPI_dest_field(mask, vector, APIC_DEST_LOGICAL);
 	local_irq_restore(flags);
 }
 
diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
index 82fb43d..f77e9fb 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -61,7 +61,7 @@ __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest)
 		if (!dest)
 			continue;
 
-		__x2apic_send_IPI_dest(dest, vector, apic->dest_logical);
+		__x2apic_send_IPI_dest(dest, vector, APIC_DEST_LOGICAL);
 		/* Remove cluster CPUs from tmpmask */
 		cpumask_andnot(tmpmsk, tmpmsk, &cmsk->mask);
 	}

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] x86/apic: Cleanup delivery mode defines
  2020-10-24 21:35                     ` [PATCH v3 05/35] x86/apic: Cleanup delivery mode defines David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, David Woodhouse, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     721612994f53ed600b39a80d912b10f51960e2e3
Gitweb:        https://git.kernel.org/tip/721612994f53ed600b39a80d912b10f51960e2e3
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Sat, 24 Oct 2020 22:35:05 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:24 +01:00

x86/apic: Cleanup delivery mode defines

The enum ioapic_irq_destination_types and the enumerated constants starting
with 'dest_' are gross misnomers because they describe the delivery mode.

Rename then enum and the constants so they actually make sense.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-6-dwmw2@infradead.org

---
 arch/x86/include/asm/apic.h           |  3 ++-
 arch/x86/include/asm/apicdef.h        | 16 +++++++---------
 arch/x86/kernel/apic/apic_flat_64.c   |  4 ++--
 arch/x86/kernel/apic/apic_noop.c      |  2 +-
 arch/x86/kernel/apic/apic_numachip.c  |  4 ++--
 arch/x86/kernel/apic/bigsmp_32.c      |  2 +-
 arch/x86/kernel/apic/io_apic.c        | 11 ++++++-----
 arch/x86/kernel/apic/probe_32.c       |  2 +-
 arch/x86/kernel/apic/x2apic_cluster.c |  2 +-
 arch/x86/kernel/apic/x2apic_phys.c    |  2 +-
 arch/x86/kernel/apic/x2apic_uv_x.c    |  6 +++---
 arch/x86/platform/uv/uv_irq.c         |  2 +-
 drivers/iommu/amd/iommu.c             |  4 ++--
 drivers/iommu/intel/irq_remapping.c   |  2 +-
 drivers/pci/controller/pci-hyperv.c   |  6 +++---
 15 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 57af25c..37a08f3 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -309,7 +309,8 @@ struct apic {
 	/* dest_logical is used by the IPI functions */
 	u32	dest_logical;
 	u32	disable_esr;
-	u32	irq_delivery_mode;
+
+	enum apic_delivery_modes delivery_mode;
 	u32	irq_dest_mode;
 
 	u32	(*calc_dest_apicid)(unsigned int cpu);
diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
index 05e694e..5716f22 100644
--- a/arch/x86/include/asm/apicdef.h
+++ b/arch/x86/include/asm/apicdef.h
@@ -432,15 +432,13 @@ struct local_apic {
  #define BAD_APICID 0xFFFFu
 #endif
 
-enum ioapic_irq_destination_types {
-	dest_Fixed		= 0,
-	dest_LowestPrio		= 1,
-	dest_SMI		= 2,
-	dest__reserved_1	= 3,
-	dest_NMI		= 4,
-	dest_INIT		= 5,
-	dest__reserved_2	= 6,
-	dest_ExtINT		= 7
+enum apic_delivery_modes {
+	APIC_DELIVERY_MODE_FIXED	= 0,
+	APIC_DELIVERY_MODE_LOWESTPRIO   = 1,
+	APIC_DELIVERY_MODE_SMI		= 2,
+	APIC_DELIVERY_MODE_NMI		= 4,
+	APIC_DELIVERY_MODE_INIT		= 5,
+	APIC_DELIVERY_MODE_EXTINT	= 7,
 };
 
 #endif /* _ASM_X86_APICDEF_H */
diff --git a/arch/x86/kernel/apic/apic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c
index 7862b15..fdd38a1 100644
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -113,7 +113,7 @@ static struct apic apic_flat __ro_after_init = {
 	.apic_id_valid			= default_apic_id_valid,
 	.apic_id_registered		= flat_apic_id_registered,
 
-	.irq_delivery_mode		= dest_Fixed,
+	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
 	.irq_dest_mode			= 1, /* logical */
 
 	.disable_esr			= 0,
@@ -206,7 +206,7 @@ static struct apic apic_physflat __ro_after_init = {
 	.apic_id_valid			= default_apic_id_valid,
 	.apic_id_registered		= flat_apic_id_registered,
 
-	.irq_delivery_mode		= dest_Fixed,
+	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
 	.irq_dest_mode			= 0, /* physical */
 
 	.disable_esr			= 0,
diff --git a/arch/x86/kernel/apic/apic_noop.c b/arch/x86/kernel/apic/apic_noop.c
index 780c702..4fc934b 100644
--- a/arch/x86/kernel/apic/apic_noop.c
+++ b/arch/x86/kernel/apic/apic_noop.c
@@ -95,7 +95,7 @@ struct apic apic_noop __ro_after_init = {
 	.apic_id_valid			= default_apic_id_valid,
 	.apic_id_registered		= noop_apic_id_registered,
 
-	.irq_delivery_mode		= dest_Fixed,
+	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
 	/* logical delivery broadcast to all CPUs: */
 	.irq_dest_mode			= 1,
 
diff --git a/arch/x86/kernel/apic/apic_numachip.c b/arch/x86/kernel/apic/apic_numachip.c
index 35edd57..db715d0 100644
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -246,7 +246,7 @@ static const struct apic apic_numachip1 __refconst = {
 	.apic_id_valid			= numachip_apic_id_valid,
 	.apic_id_registered		= numachip_apic_id_registered,
 
-	.irq_delivery_mode		= dest_Fixed,
+	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
 	.irq_dest_mode			= 0, /* physical */
 
 	.disable_esr			= 0,
@@ -295,7 +295,7 @@ static const struct apic apic_numachip2 __refconst = {
 	.apic_id_valid			= numachip_apic_id_valid,
 	.apic_id_registered		= numachip_apic_id_registered,
 
-	.irq_delivery_mode		= dest_Fixed,
+	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
 	.irq_dest_mode			= 0, /* physical */
 
 	.disable_esr			= 0,
diff --git a/arch/x86/kernel/apic/bigsmp_32.c b/arch/x86/kernel/apic/bigsmp_32.c
index 98d015a..7f6461f 100644
--- a/arch/x86/kernel/apic/bigsmp_32.c
+++ b/arch/x86/kernel/apic/bigsmp_32.c
@@ -127,7 +127,7 @@ static struct apic apic_bigsmp __ro_after_init = {
 	.apic_id_valid			= default_apic_id_valid,
 	.apic_id_registered		= bigsmp_apic_id_registered,
 
-	.irq_delivery_mode		= dest_Fixed,
+	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
 	/* phys delivery to target CPU: */
 	.irq_dest_mode			= 0,
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 7b3c7e0..cff6cbc 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -535,7 +535,7 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin)
 
 	/* Check delivery_mode to be sure we're not clearing an SMI pin */
 	entry = ioapic_read_entry(apic, pin);
-	if (entry.delivery_mode == dest_SMI)
+	if (entry.delivery_mode == APIC_DELIVERY_MODE_SMI)
 		return;
 
 	/*
@@ -1368,7 +1368,8 @@ void __init enable_IO_APIC(void)
 		/* If the interrupt line is enabled and in ExtInt mode
 		 * I have found the pin where the i8259 is connected.
 		 */
-		if ((entry.mask == 0) && (entry.delivery_mode == dest_ExtINT)) {
+		if ((entry.mask == 0) &&
+		    (entry.delivery_mode == APIC_DELIVERY_MODE_EXTINT)) {
 			ioapic_i8259.apic = apic;
 			ioapic_i8259.pin  = pin;
 			goto found_i8259;
@@ -1416,7 +1417,7 @@ void native_restore_boot_irq_mode(void)
 		entry.trigger		= IOAPIC_EDGE;
 		entry.polarity		= IOAPIC_POL_HIGH;
 		entry.dest_mode		= IOAPIC_DEST_MODE_PHYSICAL;
-		entry.delivery_mode	= dest_ExtINT;
+		entry.delivery_mode	= APIC_DELIVERY_MODE_EXTINT;
 		entry.dest		= read_apic_id();
 
 		/*
@@ -2047,7 +2048,7 @@ static inline void __init unlock_ExtINT_logic(void)
 	entry1.dest_mode = IOAPIC_DEST_MODE_PHYSICAL;
 	entry1.mask = IOAPIC_UNMASKED;
 	entry1.dest = hard_smp_processor_id();
-	entry1.delivery_mode = dest_ExtINT;
+	entry1.delivery_mode = APIC_DELIVERY_MODE_EXTINT;
 	entry1.polarity = entry0.polarity;
 	entry1.trigger = IOAPIC_EDGE;
 	entry1.vector = 0;
@@ -2948,7 +2949,7 @@ static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
 			   struct IO_APIC_route_entry *entry)
 {
 	memset(entry, 0, sizeof(*entry));
-	entry->delivery_mode = apic->irq_delivery_mode;
+	entry->delivery_mode = apic->delivery_mode;
 	entry->dest_mode     = apic->irq_dest_mode;
 	entry->dest	     = cfg->dest_apicid;
 	entry->vector	     = cfg->vector;
diff --git a/arch/x86/kernel/apic/probe_32.c b/arch/x86/kernel/apic/probe_32.c
index 67b6f7c..77c6e2e 100644
--- a/arch/x86/kernel/apic/probe_32.c
+++ b/arch/x86/kernel/apic/probe_32.c
@@ -69,7 +69,7 @@ static struct apic apic_default __ro_after_init = {
 	.apic_id_valid			= default_apic_id_valid,
 	.apic_id_registered		= default_apic_id_registered,
 
-	.irq_delivery_mode		= dest_Fixed,
+	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
 	/* logical delivery broadcast to all CPUs: */
 	.irq_dest_mode			= 1,
 
diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
index b0889c4..82fb43d 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -184,7 +184,7 @@ static struct apic apic_x2apic_cluster __ro_after_init = {
 	.apic_id_valid			= x2apic_apic_id_valid,
 	.apic_id_registered		= x2apic_apic_id_registered,
 
-	.irq_delivery_mode		= dest_Fixed,
+	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
 	.irq_dest_mode			= 1, /* logical */
 
 	.disable_esr			= 0,
diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c
index e14eae6..437e843 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -157,7 +157,7 @@ static struct apic apic_x2apic_phys __ro_after_init = {
 	.apic_id_valid			= x2apic_apic_id_valid,
 	.apic_id_registered		= x2apic_apic_id_registered,
 
-	.irq_delivery_mode		= dest_Fixed,
+	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
 	.irq_dest_mode			= 0, /* physical */
 
 	.disable_esr			= 0,
diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c
index 9ade9e6..49deefd 100644
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -703,9 +703,9 @@ static void uv_send_IPI_one(int cpu, int vector)
 	unsigned long dmode, val;
 
 	if (vector == NMI_VECTOR)
-		dmode = dest_NMI;
+		dmode = APIC_DELIVERY_MODE_NMI;
 	else
-		dmode = dest_Fixed;
+		dmode = APIC_DELIVERY_MODE_FIXED;
 
 	val = (1UL << UVH_IPI_INT_SEND_SHFT) |
 		(apicid << UVH_IPI_INT_APIC_ID_SHFT) |
@@ -807,7 +807,7 @@ static struct apic apic_x2apic_uv_x __ro_after_init = {
 	.apic_id_valid			= uv_apic_id_valid,
 	.apic_id_registered		= uv_apic_id_registered,
 
-	.irq_delivery_mode		= dest_Fixed,
+	.delivery_mode			= APIC_DELIVERY_MODE_FIXED,
 	.irq_dest_mode			= 0, /* Physical */
 
 	.disable_esr			= 0,
diff --git a/arch/x86/platform/uv/uv_irq.c b/arch/x86/platform/uv/uv_irq.c
index 18ca226..e7020d1 100644
--- a/arch/x86/platform/uv/uv_irq.c
+++ b/arch/x86/platform/uv/uv_irq.c
@@ -35,7 +35,7 @@ static void uv_program_mmr(struct irq_cfg *cfg, struct uv_irq_2_mmr_pnode *info)
 	mmr_value = 0;
 	entry = (struct uv_IO_APIC_route_entry *)&mmr_value;
 	entry->vector		= cfg->vector;
-	entry->delivery_mode	= apic->irq_delivery_mode;
+	entry->delivery_mode	= apic->delivery_mode;
 	entry->dest_mode	= apic->irq_dest_mode;
 	entry->polarity		= 0;
 	entry->trigger		= 0;
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index b9cf594..bc81b91 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3671,7 +3671,7 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
 
 	data->irq_2_irte.devid = devid;
 	data->irq_2_irte.index = index + sub_handle;
-	iommu->irte_ops->prepare(data->entry, apic->irq_delivery_mode,
+	iommu->irte_ops->prepare(data->entry, apic->delivery_mode,
 				 apic->irq_dest_mode, irq_cfg->vector,
 				 irq_cfg->dest_apicid, devid);
 
@@ -3944,7 +3944,7 @@ int amd_iommu_deactivate_guest_mode(void *data)
 
 	entry->lo.fields_remap.valid       = valid;
 	entry->lo.fields_remap.dm          = apic->irq_dest_mode;
-	entry->lo.fields_remap.int_type    = apic->irq_delivery_mode;
+	entry->lo.fields_remap.int_type    = apic->delivery_mode;
 	entry->hi.fields.vector            = cfg->vector;
 	entry->lo.fields_remap.destination =
 				APICID_TO_IRTE_DEST_LO(cfg->dest_apicid);
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 0cfce1d..d44e719 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1122,7 +1122,7 @@ static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
 	 * irq migration in the presence of interrupt-remapping.
 	*/
 	irte->trigger_mode = 0;
-	irte->dlvry_mode = apic->irq_delivery_mode;
+	irte->dlvry_mode = apic->delivery_mode;
 	irte->vector = vector;
 	irte->dest_id = IRTE_DEST(dest);
 	irte->redir_hint = 1;
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 03ed5cb..6db8d96 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -1226,7 +1226,7 @@ static void hv_irq_unmask(struct irq_data *data)
 	params->int_target.vector = cfg->vector;
 
 	/*
-	 * Honoring apic->irq_delivery_mode set to dest_Fixed by
+	 * Honoring apic->delivery_mode set to APIC_DELIVERY_MODE_FIXED by
 	 * setting the HV_DEVICE_INTERRUPT_TARGET_MULTICAST flag results in a
 	 * spurious interrupt storm. Not doing so does not seem to have a
 	 * negative effect (yet?).
@@ -1324,7 +1324,7 @@ static u32 hv_compose_msi_req_v1(
 	int_pkt->wslot.slot = slot;
 	int_pkt->int_desc.vector = vector;
 	int_pkt->int_desc.vector_count = 1;
-	int_pkt->int_desc.delivery_mode = dest_Fixed;
+	int_pkt->int_desc.delivery_mode = APIC_DELIVERY_MODE_FIXED;
 
 	/*
 	 * Create MSI w/ dummy vCPU set, overwritten by subsequent retarget in
@@ -1345,7 +1345,7 @@ static u32 hv_compose_msi_req_v2(
 	int_pkt->wslot.slot = slot;
 	int_pkt->int_desc.vector = vector;
 	int_pkt->int_desc.vector_count = 1;
-	int_pkt->int_desc.delivery_mode = dest_Fixed;
+	int_pkt->int_desc.delivery_mode = APIC_DELIVERY_MODE_FIXED;
 
 	/*
 	 * Create MSI w/ dummy vCPU set targeting just one vCPU, overwritten

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] x86/devicetree: Fix the ioapic interrupt type table
  2020-10-24 21:35                     ` [PATCH v3 04/35] x86/devicetree: Fix the ioapic interrupt type table David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, David Woodhouse, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     2e730cb56b2cd1626fecaf23ef1537fb24721ef2
Gitweb:        https://git.kernel.org/tip/2e730cb56b2cd1626fecaf23ef1537fb24721ef2
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Sat, 24 Oct 2020 22:35:04 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:24 +01:00

x86/devicetree: Fix the ioapic interrupt type table

The ioapic interrupt type table is wrong as it assumes that polarity in
IO/APIC context means active high when set. But the IO/APIC polarity is
working the other way round. This works because the ordering of the entries
is consistent with the device tree and the type information is not used by
the IO/APIC interrupt chip.

The whole trigger and polarity business of IO/APIC is misleading and the
corresponding constants which are defined as 0/1 are not used consistently
and are going to be removed.

Rename the type table members to 'is_level' and 'active_low' and adjust the
type information for consistency sake.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-5-dwmw2@infradead.org

---
 arch/x86/kernel/devicetree.c | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/devicetree.c b/arch/x86/kernel/devicetree.c
index ddffd80..6a4cb71 100644
--- a/arch/x86/kernel/devicetree.c
+++ b/arch/x86/kernel/devicetree.c
@@ -184,31 +184,31 @@ static unsigned int ioapic_id;
 
 struct of_ioapic_type {
 	u32 out_type;
-	u32 trigger;
-	u32 polarity;
+	u32 is_level;
+	u32 active_low;
 };
 
 static struct of_ioapic_type of_ioapic_type[] =
 {
 	{
-		.out_type	= IRQ_TYPE_EDGE_RISING,
-		.trigger	= IOAPIC_EDGE,
-		.polarity	= 1,
+		.out_type	= IRQ_TYPE_EDGE_FALLING,
+		.is_level	= 0,
+		.active_low	= 1,
 	},
 	{
-		.out_type	= IRQ_TYPE_LEVEL_LOW,
-		.trigger	= IOAPIC_LEVEL,
-		.polarity	= 0,
+		.out_type	= IRQ_TYPE_LEVEL_HIGH,
+		.is_level	= 1,
+		.active_low	= 0,
 	},
 	{
-		.out_type	= IRQ_TYPE_LEVEL_HIGH,
-		.trigger	= IOAPIC_LEVEL,
-		.polarity	= 1,
+		.out_type	= IRQ_TYPE_LEVEL_LOW,
+		.is_level	= 1,
+		.active_low	= 1,
 	},
 	{
-		.out_type	= IRQ_TYPE_EDGE_FALLING,
-		.trigger	= IOAPIC_EDGE,
-		.polarity	= 0,
+		.out_type	= IRQ_TYPE_EDGE_RISING,
+		.is_level	= 0,
+		.active_low	= 0,
 	},
 };
 
@@ -228,7 +228,7 @@ static int dt_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
 		return -EINVAL;
 
 	it = &of_ioapic_type[type_index];
-	ioapic_set_alloc_attr(&tmp, NUMA_NO_NODE, it->trigger, it->polarity);
+	ioapic_set_alloc_attr(&tmp, NUMA_NO_NODE, it->is_level, it->active_low);
 	tmp.devid = mpc_ioapic_id(mp_irqdomain_ioapic_idx(domain));
 	tmp.ioapic.pin = fwspec->param[0];
 

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] x86/msi: Only use high bits of MSI address for DMAR unit
  2020-10-24 21:35                     ` [PATCH v3 02/35] x86/msi: Only use high bits of MSI address for DMAR unit David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for David Woodhouse
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for David Woodhouse @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: David Woodhouse, Thomas Gleixner, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     47bea873cf809f490cfac0c4e88533fd7fed6064
Gitweb:        https://git.kernel.org/tip/47bea873cf809f490cfac0c4e88533fd7fed6064
Author:        David Woodhouse <dwmw@amazon.co.uk>
AuthorDate:    Sat, 24 Oct 2020 22:35:02 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:24 +01:00

x86/msi: Only use high bits of MSI address for DMAR unit

The Intel IOMMU has an MSI-like configuration for its interrupt, but it
isn't really MSI. So it gets to abuse the high 32 bits of the address, and
puts the high 24 bits of the extended APIC ID there.

This isn't something that can be used in the general case for real MSIs,
since external devices using the high bits of the address would be
performing writes to actual memory space above 4GiB, not targeted at the
APIC.

Factor the hack out and allow it only to be used when appropriate, adding a
WARN_ON_ONCE() if other MSIs are targeted at an unreachable APIC ID. That
should never happen since the compatibility MSI messages are not used when
Interrupt Remapping is enabled.

The x2apic_enabled() check isn't needed because Linux won't bring up CPUs
with higher APIC IDs unless IR and x2apic are enabled anyway.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-3-dwmw2@infradead.org

---
 arch/x86/kernel/apic/msi.c | 33 +++++++++++++++++++++++++++------
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 6313f0a..516df47 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -23,13 +23,11 @@
 
 struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
 
-static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg)
+static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
+				  bool dmar)
 {
 	msg->address_hi = MSI_ADDR_BASE_HI;
 
-	if (x2apic_enabled())
-		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
-
 	msg->address_lo =
 		MSI_ADDR_BASE_LO |
 		((apic->irq_dest_mode == 0) ?
@@ -43,18 +41,29 @@ static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg)
 		MSI_DATA_LEVEL_ASSERT |
 		MSI_DATA_DELIVERY_FIXED |
 		MSI_DATA_VECTOR(cfg->vector);
+
+	/*
+	 * Only the IOMMU itself can use the trick of putting destination
+	 * APIC ID into the high bits of the address. Anything else would
+	 * just be writing to memory if it tried that, and needs IR to
+	 * address higher APIC IDs.
+	 */
+	if (dmar)
+		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
+	else
+		WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid));
 }
 
 void x86_vector_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
 {
-	__irq_msi_compose_msg(irqd_cfg(data), msg);
+	__irq_msi_compose_msg(irqd_cfg(data), msg, false);
 }
 
 static void irq_msi_update_msg(struct irq_data *irqd, struct irq_cfg *cfg)
 {
 	struct msi_msg msg[2] = { [1] = { }, };
 
-	__irq_msi_compose_msg(cfg, msg);
+	__irq_msi_compose_msg(cfg, msg, false);
 	irq_data_get_irq_chip(irqd)->irq_write_msi_msg(irqd, msg);
 }
 
@@ -276,6 +285,17 @@ struct irq_domain *arch_create_remap_msi_irq_domain(struct irq_domain *parent,
 #endif
 
 #ifdef CONFIG_DMAR_TABLE
+/*
+ * The Intel IOMMU (ab)uses the high bits of the MSI address to contain the
+ * high bits of the destination APIC ID. This can't be done in the general
+ * case for MSIs as it would be targeting real memory above 4GiB not the
+ * APIC.
+ */
+static void dmar_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
+{
+	__irq_msi_compose_msg(irqd_cfg(data), msg, true);
+}
+
 static void dmar_msi_write_msg(struct irq_data *data, struct msi_msg *msg)
 {
 	dmar_msi_write(data->irq, msg);
@@ -288,6 +308,7 @@ static struct irq_chip dmar_msi_controller = {
 	.irq_ack		= irq_chip_ack_parent,
 	.irq_set_affinity	= msi_domain_set_affinity,
 	.irq_retrigger		= irq_chip_retrigger_hierarchy,
+	.irq_compose_msi_msg	= dmar_msi_compose_msg,
 	.irq_write_msi_msg	= dmar_msi_write_msg,
 	.flags			= IRQCHIP_SKIP_SET_WAKE,
 };

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] x86/apic/uv: Fix inconsistent destination mode
  2020-10-24 21:35                     ` [PATCH v3 03/35] x86/apic/uv: Fix inconsistent destination mode David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, David Woodhouse, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     93b7a3d6a1f0f159d390959de7a1b9ad863d6b26
Gitweb:        https://git.kernel.org/tip/93b7a3d6a1f0f159d390959de7a1b9ad863d6b26
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Sat, 24 Oct 2020 22:35:03 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:24 +01:00

x86/apic/uv: Fix inconsistent destination mode

The UV x2apic is strictly using physical destination mode, but
apic::dest_logical is initialized with APIC_DEST_LOGICAL.

This does not matter much because UV does not use any of the generic
functions which use apic::dest_logical, but is still inconsistent.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-4-dwmw2@infradead.org

---
 arch/x86/kernel/apic/x2apic_uv_x.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c
index 714233c..9ade9e6 100644
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -811,7 +811,7 @@ static struct apic apic_x2apic_uv_x __ro_after_init = {
 	.irq_dest_mode			= 0, /* Physical */
 
 	.disable_esr			= 0,
-	.dest_logical			= APIC_DEST_LOGICAL,
+	.dest_logical			= APIC_DEST_PHYSICAL,
 	.check_apicid_used		= NULL,
 
 	.init_apic_ldr			= uv_init_apic_ldr,

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] x86/apic: Fix x2apic enablement without interrupt remapping
  2020-10-24 21:35                     ` [PATCH v3 01/35] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
@ 2020-10-29 12:15                       ` tip-bot2 for David Woodhouse
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for David Woodhouse @ 2020-10-29 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: David Woodhouse, Thomas Gleixner, x86, LKML

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     26573a97746c7a99f394f9d398ce91a8853b3b89
Gitweb:        https://git.kernel.org/tip/26573a97746c7a99f394f9d398ce91a8853b3b89
Author:        David Woodhouse <dwmw@amazon.co.uk>
AuthorDate:    Sat, 24 Oct 2020 22:35:01 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Oct 2020 20:26:23 +01:00

x86/apic: Fix x2apic enablement without interrupt remapping

Currently, Linux as a hypervisor guest will enable x2apic only if there are
no CPUs present at boot time with an APIC ID above 255.

Hotplugging a CPU later with a higher APIC ID would result in a CPU which
cannot be targeted by external interrupts.

Add a filter in x2apic_apic_id_valid() which can be used to prevent such
CPUs from coming online, and allow x2apic to be enabled even if they are
present at boot time.

Fixes: ce69a784504 ("x86/apic: Enable x2APIC without interrupt remapping under KVM")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-2-dwmw2@infradead.org

---
 arch/x86/include/asm/apic.h        |  1 +
 arch/x86/kernel/apic/apic.c        | 14 ++++++++------
 arch/x86/kernel/apic/x2apic_phys.c |  9 +++++++++
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 4e3099d..57af25c 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -259,6 +259,7 @@ static inline u64 native_x2apic_icr_read(void)
 
 extern int x2apic_mode;
 extern int x2apic_phys;
+extern void __init x2apic_set_max_apicid(u32 apicid);
 extern void __init check_x2apic(void);
 extern void x2apic_setup(void);
 static inline int x2apic_enabled(void)
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index b3eef1d..113f6ca 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1841,20 +1841,22 @@ static __init void try_to_enable_x2apic(int remap_mode)
 		return;
 
 	if (remap_mode != IRQ_REMAP_X2APIC_MODE) {
-		/* IR is required if there is APIC ID > 255 even when running
-		 * under KVM
+		/*
+		 * Using X2APIC without IR is not architecturally supported
+		 * on bare metal but may be supported in guests.
 		 */
-		if (max_physical_apicid > 255 ||
-		    !x86_init.hyper.x2apic_available()) {
+		if (!x86_init.hyper.x2apic_available()) {
 			pr_info("x2apic: IRQ remapping doesn't support X2APIC mode\n");
 			x2apic_disable();
 			return;
 		}
 
 		/*
-		 * without IR all CPUs can be addressed by IOAPIC/MSI
-		 * only in physical mode
+		 * Without IR, all CPUs can be addressed by IOAPIC/MSI only
+		 * in physical mode, and CPUs with an APIC ID that cannnot
+		 * be addressed must not be brought online.
 		 */
+		x2apic_set_max_apicid(255);
 		x2apic_phys = 1;
 	}
 	x2apic_enable();
diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c
index bc96938..e14eae6 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -8,6 +8,12 @@
 int x2apic_phys;
 
 static struct apic apic_x2apic_phys;
+static u32 x2apic_max_apicid __ro_after_init;
+
+void __init x2apic_set_max_apicid(u32 apicid)
+{
+	x2apic_max_apicid = apicid;
+}
 
 static int __init set_x2apic_phys_mode(char *arg)
 {
@@ -98,6 +104,9 @@ static int x2apic_phys_probe(void)
 /* Common x2apic functions, also used by x2apic_cluster */
 int x2apic_apic_id_valid(u32 apicid)
 {
+	if (x2apic_max_apicid && apicid > x2apic_max_apicid)
+		return 0;
+
 	return 1;
 }
 

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* Re: [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
@ 2020-11-09 23:15                         ` Tom Lendacky
  2020-11-10  3:30                           ` [EXTERNAL] " David Woodhouse
  2020-11-10  6:10                           ` Borislav Petkov
  0 siblings, 2 replies; 192+ messages in thread
From: Tom Lendacky @ 2020-11-09 23:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: Thomas Gleixner, David Woodhouse, x86

On 10/29/20 7:15 AM, tip-bot2 for Thomas Gleixner wrote:
> The following commit has been merged into the x86/apic branch of tip:
> 
> Commit-ID:     a27dca645d2c0f31abb7858aa0e10b2fa0f2f659
> Gitweb:        https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=a27dca645d2c0f31abb7858aa0e10b2fa0f2f659
> Author:        Thomas Gleixner <tglx@linutronix.de>
> AuthorDate:    Sat, 24 Oct 2020 22:35:19 +01:00
> Committer:     Thomas Gleixner <tglx@linutronix.de>
> CommitterDate: Wed, 28 Oct 2020 20:26:26 +01:00
> 
> x86/io_apic: Cleanup trigger/polarity helpers
> 
> 'trigger' and 'polarity' are used throughout the I/O-APIC code for handling
> the trigger type (edge/level) and the active low/high configuration. While
> there are defines for initializing these variables and struct members, they
> are not used consequently and the meaning of 'trigger' and 'polarity' is
> opaque and confusing at best.
> 
> Rename them to 'is_level' and 'active_low' and make them boolean in various
> structs so it's entirely clear what the meaning is.

Running the tip tree on my second generation EPYC system I'm seeing lots
of the following:

[  105.325371] hpet: Lost 9601 RTC interrupts
[  105.485766] hpet: Lost 9600 RTC interrupts
[  105.639182] hpet: Lost 9601 RTC interrupts
[  105.792155] hpet: Lost 9601 RTC interrupts
[  105.947076] hpet: Lost 9601 RTC interrupts
[  106.100876] hpet: Lost 9600 RTC interrupts
[  106.253444] hpet: Lost 9601 RTC interrupts
[  106.406722] hpet: Lost 9601 RTC interrupts

preventing the system from booting. I bisected it to this commit.

Additionally, I'm seeing warnings and error messages (which I haven't
bisected, yet) along these lines:

[   12.790801] WARNING: CPU: 135 PID: 1 at arch/x86/kernel/apic/apic.c:2505 __irq_msi_compose_msg+0x79/0x80
[   98.121716] irq 3: nobody cared (try booting with the "irqpoll" option)
[  100.692087] irq 15: nobody cared (try booting with the "irqpoll" option)
[  100.800217] irq 11: nobody cared (try booting with the "irqpoll" option)
[  100.800407] irq 10: nobody cared (try booting with the "irqpoll" option)

Thanks,
Tom


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [EXTERNAL] [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-09 23:15                         ` Tom Lendacky
@ 2020-11-10  3:30                           ` David Woodhouse
  2020-11-10  6:10                           ` Borislav Petkov
  1 sibling, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-11-10  3:30 UTC (permalink / raw)
  To: Tom Lendacky, linux-kernel; +Cc: Thomas Gleixner, x86

[-- Attachment #1: Type: text/plain, Size: 2492 bytes --]

On Mon, 2020-11-09 at 17:15 -0600, Tom Lendacky wrote:
> On 10/29/20 7:15 AM, tip-bot2 for Thomas Gleixner wrote:
> > The following commit has been merged into the x86/apic branch of tip:
> > 
> > Commit-ID:     a27dca645d2c0f31abb7858aa0e10b2fa0f2f659
> > Gitweb:        https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=a27dca645d2c0f31abb7858aa0e10b2fa0f2f659
> > Author:        Thomas Gleixner <tglx@linutronix.de>
> > AuthorDate:    Sat, 24 Oct 2020 22:35:19 +01:00
> > Committer:     Thomas Gleixner <tglx@linutronix.de>
> > CommitterDate: Wed, 28 Oct 2020 20:26:26 +01:00
> > 
> > x86/io_apic: Cleanup trigger/polarity helpers
> > 
> > 'trigger' and 'polarity' are used throughout the I/O-APIC code for handling
> > the trigger type (edge/level) and the active low/high configuration. While
> > there are defines for initializing these variables and struct members, they
> > are not used consequently and the meaning of 'trigger' and 'polarity' is
> > opaque and confusing at best.
> > 
> > Rename them to 'is_level' and 'active_low' and make them boolean in various
> > structs so it's entirely clear what the meaning is.
> 
> Running the tip tree on my second generation EPYC system I'm seeing lots
> of the following:
> 
> [  105.325371] hpet: Lost 9601 RTC interrupts
> [  105.485766] hpet: Lost 9600 RTC interrupts
> [  105.639182] hpet: Lost 9601 RTC interrupts
> [  105.792155] hpet: Lost 9601 RTC interrupts
> [  105.947076] hpet: Lost 9601 RTC interrupts
> [  106.100876] hpet: Lost 9600 RTC interrupts
> [  106.253444] hpet: Lost 9601 RTC interrupts
> [  106.406722] hpet: Lost 9601 RTC interrupts
> 
> preventing the system from booting. I bisected it to this commit.
> 
> Additionally, I'm seeing warnings and error messages (which I haven't
> bisected, yet) along these lines:
> 
> [   12.790801] WARNING: CPU: 135 PID: 1 at arch/x86/kernel/apic/apic.c:2505 __irq_msi_compose_msg+0x79/0x80
> [   98.121716] irq 3: nobody cared (try booting with the "irqpoll" option)
> [  100.692087] irq 15: nobody cared (try booting with the "irqpoll" option)
> [  100.800217] irq 11: nobody cared (try booting with the "irqpoll" option)
> [  100.800407] irq 10: nobody cared (try booting with the "irqpoll" option)

Odd. The warning suggests we're asking __irq_msi_compose_msg() to
compose an MSI targeted at an APIC ID above 255, which shouldn't ever
happen.

Can we see a full boot log with apic=verbose please? 

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-09 23:15                         ` Tom Lendacky
  2020-11-10  3:30                           ` [EXTERNAL] " David Woodhouse
@ 2020-11-10  6:10                           ` Borislav Petkov
  2020-11-10 14:34                             ` Thomas Gleixner
  1 sibling, 1 reply; 192+ messages in thread
From: Borislav Petkov @ 2020-11-10  6:10 UTC (permalink / raw)
  To: Tom Lendacky; +Cc: linux-kernel, Thomas Gleixner, David Woodhouse, x86

On Mon, Nov 09, 2020 at 05:15:03PM -0600, Tom Lendacky wrote:
> [  105.325371] hpet: Lost 9601 RTC interrupts
> [  105.485766] hpet: Lost 9600 RTC interrupts
> [  105.639182] hpet: Lost 9601 RTC interrupts
> [  105.792155] hpet: Lost 9601 RTC interrupts
> [  105.947076] hpet: Lost 9601 RTC interrupts
> [  106.100876] hpet: Lost 9600 RTC interrupts
> [  106.253444] hpet: Lost 9601 RTC interrupts
> [  106.406722] hpet: Lost 9601 RTC interrupts
> 
> preventing the system from booting. I bisected it to this commit.

I bisected it to the exact same thing too, on an AMD laptop, after seeing what
you're seeing.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 19/35] x86/io_apic: Cleanup trigger/polarity helpers
  2020-10-24 21:35                     ` [PATCH v3 19/35] x86/io_apic: Cleanup trigger/polarity helpers David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
@ 2020-11-10  6:31                       ` Qian Cai
  2020-11-10  8:59                         ` David Woodhouse
  1 sibling, 1 reply; 192+ messages in thread
From: Qian Cai @ 2020-11-10  6:31 UTC (permalink / raw)
  To: David Woodhouse, x86, Thomas Gleixner
  Cc: kvm, iommu, joro, Paolo Bonzini, linux-kernel, linux-hyperv, maz,
	Dexuan Cui, Stephen Rothwell, Linux Next Mailing List

On Sat, 2020-10-24 at 22:35 +0100, David Woodhouse wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> 'trigger' and 'polarity' are used throughout the I/O-APIC code for handling
> the trigger type (edge/level) and the active low/high configuration. While
> there are defines for initializing these variables and struct members, they
> are not used consequently and the meaning of 'trigger' and 'polarity' is
> opaque and confusing at best.
> 
> Rename them to 'is_level' and 'active_low' and make them boolean in various
> structs so it's entirely clear what the meaning is.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  arch/x86/include/asm/hw_irq.h       |   6 +-
>  arch/x86/kernel/apic/io_apic.c      | 244 +++++++++++++---------------
>  arch/x86/pci/intel_mid_pci.c        |   8 +-
>  drivers/iommu/amd/iommu.c           |  10 +-
>  drivers/iommu/intel/irq_remapping.c |   9 +-
>  5 files changed, 130 insertions(+), 147 deletions(-)

Reverting the rest of patchset up to this commit on next-20201109 fixed an
endless soft-lockups issue booting an AMD server below. I noticed that the
failed boots always has this IOMMU IO_PAGE_FAULT before those soft-lockups:

[ 3404.093354][    T1] AMD-Vi: Interrupt remapping enabled
[ 3404.098593][    T1] AMD-Vi: Virtual APIC enabled
[ 3404.107783][  T340] pci 0000:00:14.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xfffffffdf8020200 flags=0x0008]
[ 3404.120644][    T1] AMD-Vi: Lazy IO/TLB flushing enabled
[ 3404.126011][    T1] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[ 3404.133173][    T1] software IO TLB: mapped [mem 0x0000000068dcf000-0x000000006cdcf000] (64MB)

.config (if ever matters):
https://cailca.coding.net/public/linux/mm/git/files/master/x86.config

good boot dmesg (with the commits reverted):
http://people.redhat.com/qcai/dmesg.txt

== system info ==
Dell Poweredge R6415
AMD EPYC 7401P 24-Core Processor
24576 MB memory, 239 GB disk space

[  OK  ] Started Flush Journal to Persistent Storage.
[  OK  ] Started udev Kernel Device Manager.
[  OK  ] Started udev Coldplug all Devices.
[  OK  ] Started Monitoring of LVM2 mirrors,…sing dmeventd or progress polling.
[  OK  ] Reached target Local File Systems (Pre).
         Mounting /boot...
[  OK  ] Created slice system-lvm2\x2dpvscan.slice.
[ 3740.376500][ T1058] XFS (sda1): Mounting V5 Filesystem
[ 3740.438474][ T1058] XFS (sda1): Ending clean mount
[ 3765.159433][    C0] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [systemd:1]
[ 3765.166929][    C0] Modules linked in: acpi_cpufreq(+) ip_tables x_tables sd_mod ahci libahci tg3 bnxt_en megaraid_sas libata firmware_class libphy dm_mirror dm_region_hash dm_log dm_mod
[ 3765.183576][    C0] irq event stamp: 26230104
[ 3765.187954][    C0] hardirqs last  enabled at (26230103): [<ffffffff82800b9e>] asm_common_interrupt+0x1e/0x40
[ 3765.197873][    C0] hardirqs last disabled at (26230104): [<ffffffff8277e45a>] sysvec_apic_timer_interrupt+0xa/0xa0
[ 3765.208303][    C0] softirqs last  enabled at (26202664): [<ffffffff82a0061b>] __do_softirq+0x61b/0x95d
[ 3765.217699][    C0] softirqs last disabled at (26202591): [<ffffffff82800ec2>] asm_call_irq_on_stack+0x12/0x20
[ 3765.227702][    C0] CPU: 0 PID: 1 Comm: systemd Not tainted 5.10.0-rc2-next-20201109+ #2
[ 3765.235793][    C0] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.9.3 06/25/2019
[ 3765.244065][    C0] RIP: 0010:lock_acquire+0x1f4/0x820
lock_acquire at kernel/locking/lockdep.c:5404
[ 3765.249211][    C0] Code: ff ff ff 48 83 c4 20 65 0f c1 05 a7 ba 9e 7e 83 f8 01 4c 8b 54 24 08 0f 85 60 04 00 00 41 52 9d 48 b8 00 00 00 00 00 fc ff df <48> 01 c3 c7 03 00 00 00 00 c7 43 08 00 00 00 00 48 8b0
[ 3765.268657][    C0] RSP: 0018:ffffc9000006f9f8 EFLAGS: 00000246
[ 3765.274587][    C0] RAX: dffffc0000000000 RBX: 1ffff9200000df42 RCX: 1ffff9200000df28
[ 3765.282420][    C0] RDX: 1ffff11020645769 RSI: 00000000ffffffff RDI: 0000000000000001
[ 3765.290256][    C0] RBP: 0000000000000001 R08: fffffbfff164cb10 R09: fffffbfff164cb10
[ 3765.298090][    C0] R10: 0000000000000246 R11: fffffbfff164cb0f R12: ffff88812be555b0
[ 3765.305922][    C0] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 3765.313750][    C0] FS:  00007f12bb8c59c0(0000) GS:ffff8881b7000000(0000) knlGS:0000000000000000
[ 3765.322537][    C0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3765.328985][    C0] CR2: 00007f0c2d828fd0 CR3: 000000011868a000 CR4: 00000000003506f0
[ 3765.336820][    C0] Call Trace:
[ 3765.339979][    C0]  ? rcu_read_unlock+0x40/0x40
[ 3765.344609][    C0]  __d_move+0x2a2/0x16f0
__seqprop_spinlock_assert at include/linux/seqlock.h:277
(inlined by) __d_move at fs/dcache.c:2861
[ 3765.348711][    C0]  ? d_move+0x47/0x70
[ 3765.352560][    C0]  ? _raw_spin_unlock+0x1a/0x30
[ 3765.357275][    C0]  d_move+0x47/0x70
write_seqcount_t_end at include/linux/seqlock.h:560
(inlined by) write_sequnlock at include/linux/seqlock.h:901
(inlined by) d_move at fs/dcache.c:2916
[ 3765.360951][    C0]  ? vfs_rename+0x9ac/0x1270
[ 3765.365402][    C0]  vfs_rename+0x9ac/0x1270
vfs_rename at fs/namei.c:4330
[ 3765.369687][    C0]  ? vfs_link+0x830/0x830
[ 3765.373878][    C0]  ? do_renameat2+0x58c/0x930
[ 3765.378419][    C0]  do_renameat2+0x58c/0x930
do_renameat2 at fs/namei.c:4455
[ 3765.382790][    C0]  ? __ia32_sys_link+0x80/0x80
[ 3765.387418][    C0]  ? rcu_read_lock_bh_held+0xb0/0xb0
[ 3765.392563][    C0]  ? rcu_read_lock_sched_held+0x9c/0xd0
[ 3765.397965][    C0]  ? rcu_read_lock_bh_held+0xb0/0xb0
[ 3765.403112][    C0]  ? lockdep_hardirqs_on_prepare+0x27c/0x3d0
[ 3765.408955][    C0]  ? trace_hardirqs_on+0x1c/0x150
[ 3765.413841][    C0]  ? asm_common_interrupt+0x1e/0x40
[ 3765.418905][    C0]  ? strncpy_from_user+0x14c/0x330
[ 3765.423878][    C0]  ? strncpy_from_user+0x6b/0x330
[ 3765.428764][    C0]  ? getname_flags+0x88/0x3e0
[ 3765.433307][    C0]  __x64_sys_rename+0x75/0x90
__x64_sys_rename at fs/namei.c:4504
[ 3765.437848][    C0]  do_syscall_64+0x33/0x40
[ 3765.442129][    C0]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3765.447882][    C0] RIP: 0033:0x7f12b9b759eb
[ 3765.452166][    C0] Code: e8 5a 0a 08 00 85 c0 0f 95 c0 0f b6 c0 f7 d8 5b c3 66 0f 1f 44 00 00 b8 ff ff ff ff 5b c3 90 f3 0f 1e fa b8 52 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 69 e48
[ 3765.471611][    C0] RSP: 002b:00007fff76252538 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
[ 3765.479880][    C0] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f12b9b759eb
[ 3765.487715][    C0] RDX: 000055ea2997f400 RSI: 00007fff76252570 RDI: 000055ea2997f450
[ 3765.495548][    C0] RBP: 00007fff76252570 R08: 0000000000000000 R09: 0000000065732e32
[ 3765.503374][    C0] R10: 0000000000000003 R11: 0000000000000246 R12: 000055ea2996fcf8
[ 3765.511201][    C0] R13: 000055ea2996fcf8 R14: 000055ea277097f0 R15: 0000000000000000
[ 3765.939428][   C14] watchdog: BUG: soft lockup - CPU#14 stuck for 23s! [systemd-udevd:976]
[ 3765.947707][   C14] Modules linked in: acpi_cpufreq(+) ip_tables x_tables sd_mod ahci libahci tg3 bnxt_en megaraid_sas libata firmware_class libphy dm_mirror dm_region_hash dm_log dm_mod
[ 3765.964357][   C14] irq event stamp: 535074
[ 3765.968564][   C14] hardirqs last  enabled at (535073): [<ffffffff82800c42>] asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 3765.979266][   C14] hardirqs last disabled at (535074): [<ffffffff8277e45a>] sysvec_apic_timer_interrupt+0xa/0xa0
[ 3765.989532][   C14] softirqs last  enabled at (535072): [<ffffffff82a0061b>] __do_softirq+0x61b/0x95d
[ 3765.998761][   C14] softirqs last disabled at (535067): [<ffffffff82800ec2>] asm_call_irq_on_stack+0x12/0x20
[ 3766.008590][   C14] CPU: 14 PID: 976 Comm: systemd-udevd Tainted: G             L    5.10.0-rc2-next-20201109+ #2
[ 3766.018851][   C14] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.9.3 06/25/2019
[ 3766.027130][   C14] RIP: 0010:path_init+0x1dd/0x1380
__seqprop_spinlock_sequence at include/linux/seqlock.h:277
(inlined by) path_init at fs/namei.c:2219
[ 3766.032112][   C14] Code: 03 0f 8e 79 11 00 00 44 8b 3d 5f f0 6e 01 41 f6 c7 01 74 2e 48 b8 00 00 00 00 00 fc ff df 48 89 d1 48 c1 e9 03 48 01 c1 f3 90 <0f> b6 01 84 c0 74 08 3c 03 0f 8e 9a 0d 00 00 44 8b 3a1
[ 3766.051565][   C14] RSP: 0018:ffffc90006bdfa78 EFLAGS: 00000202
[ 3766.057493][   C14] RAX: 0000000000000000 RBX: ffffc90006bdfcc0 RCX: fffffbfff06426b0
[ 3766.065330][   C14] RDX: ffffffff83213580 RSI: 00000000d93ef353 RDI: ffffc90006bdfd00
[ 3766.073163][   C14] RBP: ffffc90006bdfb08 R08: fffffbfff164cb0d R09: fffffbfff164cb0d
[ 3766.080997][   C14] R10: 0000000000000246 R11: fffffbfff164cb0c R12: ffff8881159d0060
[ 3766.088831][   C14] R13: ffffc90006bdfcf8 R14: 0000000000000041 R15: 000000000000039d
[ 3766.096665][   C14] FS:  00007f0c2ec45980(0000) GS:ffff88866dec0000(0000) knlGS:0000000000000000
[ 3766.105454][   C14] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3766.111900][   C14] CR2: 00007f79350ceb84 CR3: 0000000160a84000 CR4: 00000000003506e0
[ 3766.119737][   C14] Call Trace:
[ 3766.122899][   C14]  ? debug_mutex_init+0x31/0x60
[ 3766.127614][   C14]  path_openat+0x154/0x1b00
path_openat at fs/namei.c:3364
[ 3766.131988][   C14]  ? rcu_read_lock_held+0xb0/0xb0
[ 3766.136881][   C14]  ? __lock_acquire+0x1dad/0x3890
[ 3766.141767][   C14]  ? path_parentat.isra.62+0x190/0x190
[ 3766.147090][   C14]  ? rcu_read_lock_sched_held+0x9c/0xd0
[ 3766.152496][   C14]  ? rcu_read_lock_bh_held+0xb0/0xb0
[ 3766.157646][   C14]  do_filp_open+0x171/0x240
[ 3766.162017][   C14]  ? may_open_dev+0xc0/0xc0
[ 3766.166388][   C14]  ? lock_downgrade+0x700/0x700
[ 3766.171104][   C14]  ? rcu_read_unlock+0x40/0x40
[ 3766.175735][   C14]  ? rwlock_bug.part.1+0x90/0x90
[ 3766.180543][   C14]  ? __check_object_size+0x454/0x670
[ 3766.185692][   C14]  ? do_raw_spin_unlock+0x4f/0x250
[ 3766.190668][   C14]  ? __alloc_fd+0x184/0x500
[ 3766.195031][   C14]  do_sys_openat2+0x2db/0x5b0
[ 3766.199576][   C14]  ? memset+0x1f/0x40
[ 3766.203422][   C14]  ? file_open_root+0x200/0x200
[ 3766.208136][   C14]  do_sys_open+0x85/0xd0
[ 3766.212245][   C14]  ? filp_open+0x50/0x50
[ 3766.216353][   C14]  ? syscall_trace_enter.isra.21+0x54/0x1f0
[ 3766.222107][   C14]  do_syscall_64+0x33/0x40
[ 3766.226389][   C14]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3766.232145][   C14] RIP: 0033:0x7f0c2d828552
[ 3766.236426][   C14] Code: 25 00 00 41 00 3d 00 00 41 00 74 4c 48 8d 05 15 50 2d 00 8b 00 85 c0 75 6d 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 a2 00 00 00 48 8b 4c 24 28 645
[ 3766.255873][   C14] RSP: 002b:00007ffc65d36000 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[ 3766.264147][   C14] RAX: ffffffffffffffda RBX: 0000564d81e50770 RCX: 00007f0c2d828552
[ 3766.271982][   C14] RDX: 0000000000080000 RSI: 00007ffc65d36150 RDI: 00000000ffffff9c
[ 3766.279817][   C14] RBP: 0000000000000008 R08: 0000000000000008 R09: 0000000000000001
[ 3766.287651][   C14] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f0c2e733347
[ 3766.295486][   C14] R13: 00007f0c2e733347 R14: 0000000000000001 R15: 0000564d81e2d1b8
[ 3766.469430][   C28] watchdog: BUG: soft lockup - CPU#28 stuck for 22s! [mount:1058]
[ 3766.477105][   C28] Modules linked in: acpi_cpufreq(+) ip_tables x_tables sd_mod ahci libahci tg3 bnxt_en megaraid_sas libata firmware_class libphy dm_mirror dm_region_hash dm_log dm_mod
[ 3766.493753][   C28] irq event stamp: 88756
[ 3766.497868][   C28] hardirqs last  enabled at (88755): [<ffffffff82800c42>] asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 3766.508481][   C28] hardirqs last disabled at (88756): [<ffffffff8277e45a>] sysvec_apic_timer_interrupt+0xa/0xa0
[ 3766.518665][   C28] softirqs last  enabled at (88748): [<ffffffff82a0061b>] __do_softirq+0x61b/0x95d
[ 3766.527805][   C28] softirqs last disabled at (88743): [<ffffffff82800ec2>] asm_call_irq_on_stack+0x12/0x20
[ 3766.537550][   C28] CPU: 28 PID: 1058 Comm: mount Tainted: G             L    5.10.0-rc2-next-20201109+ #2
[ 3766.547210][   C28] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.9.3 06/25/2019
[ 3766.555482][   C28] RIP: 0010:prepend_path.isra.6+0x349/0xdb0
__seqprop_spinlock_sequence at include/linux/seqlock.h:277
(inlined by) read_seqbegin at include/linux/seqlock.h:838
(inlined by) read_seqbegin_or_lock at include/linux/seqlock.h:1142
(inlined by) read_seqbegin_or_lock at include/linux/seqlock.h:1139
(inlined by) prepend_path at fs/d_path.c:99
[ 3766.561241][   C28] Code: 95 f0 fe ff ff 5a 41 54 9d 41 0f b6 55 00 84 d2 74 09 80 fa 03 0f 8e 75 09 00 00 45 8b 26 41 f6 c4 01 0f 84 5b 05 00 00 f3 90 <41> 0f b6 55 00 84 d2 74 e8 80 fa 03 7f e3 4c 89 f7 4ce
[ 3766.580687][   C28] RSP: 0018:ffffc9000670fa18 EFLAGS: 00000202
[ 3766.586618][   C28] RAX: ffff88812bd7d070 RBX: dffffc0000000000 RCX: 1ffff1102a164dc8
[ 3766.594449][   C28] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81bafabb
[ 3766.602287][   C28] RBP: ffffc9000670fb68 R08: 0000000000000001 R09: ffffffff81bb033d
[ 3766.610118][   C28] R10: ffff8881517d8340 R11: fffffbfff14b86e4 R12: 000000000000039d
[ 3766.617956][   C28] R13: fffffbfff06426b0 R14: ffffffff83213580 R15: ffffc9000670fbc0
[ 3766.625789][   C28] FS:  00007f96bb7a6080(0000) GS:ffff8881b71c0000(0000) knlGS:0000000000000000
[ 3766.634579][   C28] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3766.641026][   C28] CR2: 00007f96ba7343f0 CR3: 0000000146c76000 CR4: 00000000003506e0
[ 3766.648859][   C28] Call Trace:
[ 3766.652016][   C28]  ? dentry_path_raw+0x10/0x10
[ 3766.656645][   C28]  ? rcu_read_unlock+0x40/0x40
[ 3766.661276][   C28]  ? memcpy+0x38/0x60
[ 3766.665128][   C28]  d_path+0x3fd/0x660
[ 3766.668976][   C28]  ? prepend_path.isra.6+0xdb0/0xdb0
[ 3766.674129][   C28]  ? __alloc_pages_nodemask+0x650/0x760
[ 3766.679543][   C28]  ? __alloc_pages_slowpath.constprop.112+0x2120/0x2120
[ 3766.686345][   C28]  mnt_warn_timestamp_expiry+0x25a/0x270
mnt_warn_timestamp_expiry at fs/namespace.c:2555 (discriminator 1)
[ 3766.691839][   C28]  ? get_mountpoint+0x370/0x370
[ 3766.696554][   C28]  ? do_raw_spin_unlock+0x4f/0x250
[ 3766.701536][   C28]  ? _raw_spin_unlock+0x1a/0x30
[ 3766.706253][   C28]  ? vfs_create_mount+0x371/0x4a0
[ 3766.711150][   C28]  path_mount+0xe6b/0x1720
do_new_mount_fc at fs/namespace.c:2839
(inlined by) do_new_mount at fs/namespace.c:2898
(inlined by) path_mount at fs/namespace.c:3227
[ 3766.715438][   C28]  ? finish_automount+0x750/0x750
[ 3766.720328][   C28]  ? getname_flags+0x88/0x3e0
[ 3766.724877][   C28]  do_mount+0xc6/0xe0
[ 3766.728726][   C28]  ? path_mount+0x1720/0x1720
[ 3766.733275][   C28]  ? _copy_from_user+0xab/0xf0
[ 3766.737906][   C28]  ? memdup_user+0x4f/0x80
[ 3766.742193][   C28]  __x64_sys_mount+0x15d/0x1b0
[ 3766.746828][   C28]  do_syscall_64+0x33/0x40
[ 3766.751111][   C28]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3914.819439][   C36] watchdog: BUG: soft lockup - CPU#36 stuck for 22s! [systemd-journal:902]
[ 3914.827884][   C36] Modules linked in: acpi_cpufreq(+) ip_tables x_tables sd_mod ahci libahci tg3 bnxt_en megaraid_sas libata firmware_class libphy dm_mirror dm_region_hash dm_log dm_mod
[ 3914.844543][   C36] irq event stamp: 196688
[ 3914.848748][   C36] hardirqs last  enabled at (196687): [<ffffffff82800c42>] asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 3914.859450][   C36] hardirqs last disabled at (196688): [<ffffffff8277e45a>] sysvec_apic_timer_interrupt+0xa/0xa0
[ 3914.869709][   C36] softirqs last  enabled at (196684): [<ffffffff82a0061b>] __do_softirq+0x61b/0x95d
[ 3914.878930][   C36] softirqs last disabled at (196679): [<ffffffff82800ec2>] asm_call_irq_on_stack+0x12/0x20
[ 3914.888759][   C36] CPU: 36 PID: 902 Comm: systemd-journal Tainted: G             L    5.10.0-rc2-next-20201109+ #2
[ 3914.899195][   C36] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.9.3 06/25/2019
[ 3914.907471][   C36] RIP: 0010:path_init+0x1f3/0x1380
[ 3914.912448][   C36] Code: 00 00 00 00 00 fc ff df 48 89 d1 48 c1 e9 03 48 01 c1 f3 90 0f b6 01 84 c0 74 08 3c 03 0f 8e 9a 0d 00 00 44 8b 3a 41 f6 c7 01 <75> e6 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 44 48 89a
[ 3914.931901][   C36] RSP: 0018:ffffc900065dfc08 EFLAGS: 00000202
[ 3914.937837][   C36] RAX: 0000000000000000 RBX: ffffc900065dfd40 RCX: fffffbfff06426b0
[ 3914.945675][   C36] RDX: ffffffff83213580 RSI: 00000000d93ef353 RDI: ffffc900065dfd80
[ 3914.953514][   C36] RBP: ffffc900065dfc98 R08: fffffbfff164cb0d R09: fffffbfff164cb0d
[ 3914.961349][   C36] R10: 0000000000000246 R11: fffffbfff164cb0c R12: ffff8881159d5920
[ 3914.969186][   C36] R13: ffffc900065dfd78 R14: 0000000000000041 R15: 000000000000039d
[ 3914.977028][   C36] FS:  00007fc5f95f9980(0000) GS:ffff8881b7240000(0000) knlGS:0000000000000000
[ 3914.985814][   C36] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3914.992262][   C36] CR2: 000055ea2994ba42 CR3: 000000013702a000 CR4: 00000000003506e0
[ 3915.000097][   C36] Call Trace:
[ 3915.003257][   C36]  path_lookupat.isra.61+0x20/0x440
[ 3915.008323][   C36]  ? find_held_lock+0x33/0x1c0
[ 3915.012954][   C36]  filename_lookup.part.73+0x15c/0x290
[ 3915.018283][   C36]  ? __might_fault+0xb5/0x160
[ 3915.022831][   C36]  ? do_symlinkat+0x190/0x190
[ 3915.027375][   C36]  ? strncpy_from_user+0x6b/0x330
[ 3915.032268][   C36]  ? getname_flags+0x88/0x3e0
[ 3915.036809][   C36]  ? kmem_cache_alloc+0x244/0x270
[ 3915.041701][   C36]  do_faccessat+0xc0/0x5a0
[ 3915.045987][   C36]  ? stream_open+0x60/0x60
[ 3915.050269][   C36]  do_syscall_64+0x33/0x40
[ 3915.054552][   C36]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3915.060306][   C36] RIP: 0033:0x7fc5f862697b
[ 3915.064590][   C36] Code: 77 05 c3 0f 1f 40 00 48 8b 15 09 f5 2c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 0f 1f 40 00 f3 0f 1e fa b8 15 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 d9 f48
[ 3915.084044][   C36] RSP: 002b:00007ffdefba1198 EFLAGS: 00000246 ORIG_RAX: 0000000000000015
[ 3915.092322][   C36] RAX: ffffffffffffffda RBX: 00007ffdefba3cf0 RCX: 00007fc5f862697b
[ 3915.100163][   C36] RDX: 00007fc5f91032e8 RSI: 0000000000000000 RDI: 0000556a2401c0d0
[ 3915.107996][   C36] RBP: 00007ffdefba11e0 R08: 0000000000000000 R09: 0000000000000000
[ 3915.115832][   C36] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 3915.123667][   C36] R13: 0000000000000000 R14: 0000000000000009 R15: 0000556a254de950
[ 3917.159439][    C0] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [systemd:1]
[ 3917.166926][    C0] Modules linked in: acpi_cpufreq(+) ip_tables x_tables sd_mod ahci libahci tg3 bnxt_en megaraid_sas libata firmware_class libphy dm_mirror dm_region_hash dm_log dm_mod
[ 3917.183572][    C0] irq event stamp: 94238206
[ 3917.187944][    C0] hardirqs last  enabled at (94238205): [<ffffffff82800b9e>] asm_common_interrupt+0x1e/0x40
[ 3917.197857][    C0] hardirqs last disabled at (94238206): [<ffffffff8277e45a>] sysvec_apic_timer_interrupt+0xa/0xa0
[ 3917.208289][    C0] softirqs last  enabled at (94219940): [<ffffffff82a0061b>] __do_softirq+0x61b/0x95d
[ 3917.217685][    C0] softirqs last disabled at (94219881): [<ffffffff82800ec2>] asm_call_irq_on_stack+0x12/0x20
[ 3917.227688][    C0] CPU: 0 PID: 1 Comm: systemd Tainted: G             L    5.10.0-rc2-next-20201109+ #2
[ 3917.237167][    C0] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.9.3 06/25/2019
[ 3917.245437][    C0] RIP: 0010:lock_acquire+0x21b/0x820
[ 3917.250585][    C0] Code: fc ff df 48 01 c3 c7 03 00 00 00 00 c7 43 08 00 00 00 00 48 8b 84 24 90 00 00 00 65 48 33 04 25 28 00 00 00 0f 85 1a 05 00 00 <48> 81 c4 98 00 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f c3c
[ 3917.270033][    C0] RSP: 0018:ffffc9000006f9f8 EFLAGS: 00000246
[ 3917.275959][    C0] RAX: 0000000000000000 RBX: fffff5200000df42 RCX: 1ffff9200000df28
[ 3917.283795][    C0] RDX: 1ffff11020645769 RSI: 00000000ffffffff RDI: 0000000000000001
[ 3917.291627][    C0] RBP: 0000000000000001 R08: fffffbfff164cb10 R09: fffffbfff164cb10
[ 3917.299463][    C0] R10: 0000000000000246 R11: fffffbfff164cb0f R12: ffff88812be555b0
[ 3917.307297][    C0] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 3917.315133][    C0] FS:  00007f12bb8c59c0(0000) GS:ffff8881b7000000(0000) knlGS:0000000000000000
[ 3917.323921][    C0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3917.330368][    C0] CR2: 00007f0c2d828fd0 CR3: 000000011868a000 CR4: 00000000003506f0
[ 3917.338202][    C0] Call Trace:
[ 3917.341360][    C0]  ? rcu_read_unlock+0x40/0x40
[ 3917.345987][    C0]  __d_move+0x2a2/0x16f0
[ 3917.350096][    C0]  ? d_move+0x47/0x70
[ 3917.353944][    C0]  ? _raw_spin_unlock+0x1a/0x30
[ 3917.358656][    C0]  d_move+0x47/0x70
[ 3917.362330][    C0]  ? vfs_rename+0x9ac/0x1270
[ 3917.366786][    C0]  vfs_rename+0x9ac/0x1270
[ 3917.371068][    C0]  ? vfs_link+0x830/0x830
[ 3917.375261][    C0]  ? do_renameat2+0x58c/0x930
[ 3917.379804][    C0]  do_renameat2+0x58c/0x930
[ 3917.384173][    C0]  ? __ia32_sys_link+0x80/0x80
[ 3917.388800][    C0]  ? rcu_read_lock_bh_held+0xb0/0xb0
[ 3917.393948][    C0]  ? rcu_read_lock_sched_held+0x9c/0xd0
[ 3917.399354][    C0]  ? rcu_read_lock_bh_held+0xb0/0xb0
[ 3917.404503][    C0]  ? lockdep_hardirqs_on_prepare+0x27c/0x3d0
[ 3917.410346][    C0]  ? trace_hardirqs_on+0x1c/0x150
[ 3917.415234][    C0]  ? asm_common_interrupt+0x1e/0x40
[ 3917.420294][    C0]  ? strncpy_from_user+0x14c/0x330
[ 3917.425269][    C0]  ? strncpy_from_user+0x6b/0x330
[ 3917.430158][    C0]  ? getname_flags+0x88/0x3e0
[ 3917.434699][    C0]  __x64_sys_rename+0x75/0x90
[ 3917.439239][    C0]  do_syscall_64+0x33/0x40
[ 3917.443520][    C0]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3917.449274][    C0] RIP: 0033:0x7f12b9b759eb
[ 3917.453557][    C0] Code: e8 5a 0a 08 00 85 c0 0f 95 c0 0f b6 c0 f7 d8 5b c3 66 0f 1f 44 00 00 b8 ff ff ff ff 5b c3 90 f3 0f 1e fa b8 52 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 69 e48
[ 3917.473003][    C0] RSP: 002b:00007fff76252538 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
[ 3917.481272][    C0] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f12b9b759eb
[ 3917.489107][    C0] RDX: 000055ea2997f400 RSI: 00007fff76252570 RDI: 000055ea2997f450
[ 3917.496942][    C0] RBP: 00007fff76252570 R08: 0000000000000000 R09: 0000000065732e32
[ 3917.504776][    C0] R10: 0000000000000003 R11: 0000000000000246 R12: 000055ea2996fcf8
[ 3917.512610][    C0] R13: 000055ea2996fcf8 R14: 000055ea277097f0 R15: 0000000000000000


[ 3634.443982][   C37] NMI backtrace for cpu 37
[ 3634.443984][   C37] CPU: 37 PID: 906 Comm: udevadm Tainted: G             L    5.10.0-rc2-next-20201109+ #3
[ 3634.443986][   C37] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.9.3 06/25/2019
[ 3634.443987][   C37] RIP: 0010:io_serial_in+0x5c/0x90
[ 3634.443989][   C37] Code: 48 8d 7b 40 0f b6 8b f1 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 d3 e6 80 3c 02 00 75 1e 03 73 40 89 f2 ec <48> 83 c4 08 0f b6 c0 5b c3 89 74 24 04 e8 d2 90 81 ff4
[ 3634.443991][   C37] RSP: 0018:ffffc90004a38780 EFLAGS: 00000002
[ 3634.443993][   C37] RAX: dffffc0000000000 RBX: ffffffffa0b1c988 RCX: 0000000000000000
[ 3634.443995][   C37] RDX: 00000000000002fd RSI: 00000000000002fd RDI: ffffffffa0b1c9c8
[ 3634.443996][   C37] RBP: 0000000000000020 R08: 0000000000000004 R09: fffff520009470f3
[ 3634.443998][   C37] R10: 0000000000000003 R11: fffff520009470f3 R12: fffffbfff4163984
[ 3634.443999][   C37] R13: fffffbfff416393b R14: ffffffffa0b1c9d8 R15: ffffffffa0b1c988
[ 3634.444001][   C37] FS:  00007f6947084980(0000) GS:ffff88866e440000(0000) knlGS:0000000000000000
[ 3634.444002][   C37] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3634.444004][   C37] CR2: 00007fd220b94816 CR3: 000000015371e000 CR4: 00000000003506e0
[ 3634.444005][   C37] Call Trace:
[ 3634.444006][   C37]  <IRQ>
[ 3634.444007][   C37]  wait_for_xmitr+0x7c/0x1b0
[ 3634.444008][   C37]  ? wait_for_xmitr+0x1b0/0x1b0
[ 3634.444010][   C37]  serial8250_console_putchar+0x11/0x50
[ 3634.444011][   C37]  uart_console_write+0x39/0xc0
[ 3634.444012][   C37]  serial8250_console_write+0x74c/0x8a0
[ 3634.444013][   C37]  ? serial8250_start_tx+0x810/0x810
[ 3634.444015][   C37]  ? rcu_read_unlock+0x40/0x40
[ 3634.444016][   C37]  ? do_raw_spin_lock+0x121/0x290
[ 3634.444017][   C37]  ? rwlock_bug.part.1+0x90/0x90
[ 3634.444018][   C37]  ? prb_read_valid+0x61/0x90
[ 3634.444020][   C37]  ? prb_final_commit+0x10/0x10
[ 3634.444021][   C37]  console_unlock+0x721/0xa20
[ 3634.444022][   C37]  ? do_syslog.part.22+0x5d0/0x5d0
[ 3634.444023][   C37]  ? lock_acquire+0x1c8/0x820
[ 3634.444024][   C37]  ? lock_downgrade+0x700/0x700
[ 3634.444026][   C37]  vprintk_emit+0x201/0x2c0
[ 3634.444027][   C37]  printk+0x9a/0xc0
[ 3634.444028][   C37]  ? record_print_text.cold.38+0x11/0x11
[ 3634.444029][   C37]  ? __module_text_address+0xe/0x140
[ 3634.444031][   C37]  ? nmi_cpu_backtrace.cold.10+0x18/0xc5
[ 3634.444032][   C37]  show_trace_log_lvl+0x228/0x2bb
[ 3634.444033][   C37]  ? nmi_cpu_backtrace.cold.10+0x18/0xc5
[ 3634.444035][   C37]  ? nmi_cpu_backtrace.cold.10+0x18/0xc5
[ 3634.444036][   C37]  ? lapic_can_unplug_cpu+0x11/0x70
[ 3634.444037][   C37]  dump_stack+0x99/0xcb
[ 3634.444038][   C37]  nmi_cpu_backtrace.cold.10+0x18/0xc5
[ 3634.444040][   C37]  ? lapic_can_unplug_cpu+0x70/0x70
[ 3634.444041][   C37]  nmi_trigger_cpumask_backtrace+0x232/0x2a0
[ 3634.444042][   C37]  rcu_dump_cpu_stacks+0x1ad/0x1f9
[ 3634.444043][   C37]  rcu_sched_clock_irq.cold.90+0x4f7/0x968
[ 3634.444045][   C37]  ? tick_sched_do_timer+0x140/0x140
[ 3634.444046][   C37]  update_process_times+0x75/0xb0
[ 3634.444047][   C37]  tick_sched_timer+0xc8/0xf0
[ 3634.444048][   C37]  __hrtimer_run_queues+0x529/0xbb0
[ 3634.444050][   C37]  ? enqueue_hrtimer+0x330/0x330
[ 3634.444051][   C37]  ? ktime_get_update_offsets_now+0xd6/0x2b0
[ 3634.444052][   C37]  hrtimer_interrupt+0x2c2/0x750
[ 3634.444054][   C37]  __sysvec_apic_timer_interrupt+0x13c/0x520
[ 3634.444055][   C37]  asm_call_irq_on_stack+0x12/0x20
[ 3634.444056][   C37]  </IRQ>
[ 3634.444057][   C37]  sysvec_apic_timer_interrupt+0x85/0xa0
[ 3634.444059][   C37]  asm_sysvec_apic_timer_interrupt+0x12/0x20



^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 19/35] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-10  6:31                       ` [PATCH v3 19/35] x86/io_apic: Cleanup trigger/polarity helpers Qian Cai
@ 2020-11-10  8:59                         ` David Woodhouse
  2020-11-10 16:26                           ` Paolo Bonzini
  0 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-11-10  8:59 UTC (permalink / raw)
  To: Qian Cai, x86, Thomas Gleixner, Tom Murphy
  Cc: kvm, iommu, joro, Paolo Bonzini, linux-kernel, linux-hyperv, maz,
	Dexuan Cui, Stephen Rothwell, Linux Next Mailing List

[-- Attachment #1: Type: text/plain, Size: 2135 bytes --]

On Tue, 2020-11-10 at 01:31 -0500, Qian Cai wrote:
> On Sat, 2020-10-24 at 22:35 +0100, David Woodhouse wrote:
> > From: Thomas Gleixner <tglx@linutronix.de>
> > 
> > 'trigger' and 'polarity' are used throughout the I/O-APIC code for handling
> > the trigger type (edge/level) and the active low/high configuration. While
> > there are defines for initializing these variables and struct members, they
> > are not used consequently and the meaning of 'trigger' and 'polarity' is
> > opaque and confusing at best.
> > 
> > Rename them to 'is_level' and 'active_low' and make them boolean in various
> > structs so it's entirely clear what the meaning is.
> > 
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> > ---
> >   arch/x86/include/asm/hw_irq.h       |   6 +-
> >   arch/x86/kernel/apic/io_apic.c      | 244 +++++++++++++---------------
> >   arch/x86/pci/intel_mid_pci.c        |   8 +-
> >   drivers/iommu/amd/iommu.c           |  10 +-
> >   drivers/iommu/intel/irq_remapping.c |   9 +-
> >   5 files changed, 130 insertions(+), 147 deletions(-)
> 
> Reverting the rest of patchset up to this commit on next-20201109 fixed an
> endless soft-lockups issue booting an AMD server below. I noticed that the
> failed boots always has this IOMMU IO_PAGE_FAULT before those soft-lockups:

Hm, attempting to reproduce this shows something else. Ever since
commit be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma-
iommu api") in 5.5 the following stops working for me:

$ qemu-system-x86_64 -serial mon:stdio -kernel bzImage  -machine q35,accel=kvm,kernel-irqchip=split -m 2G -device amd-iommu,intremap=off -append "console=ttyS0 apic=verbose debug" -display none

It hasn't got a hard drive but I can watch the SATA interrupts fail as
it probes the CD-ROM:

[    7.403327] ata3.00: qc timeout (cmd 0xa1)
[    7.405980] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)

Adding 'iommu=off' to the kernel command line makes it work again, in
that it correctly panics at the lack of a root file system, quickly.



[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-10  6:10                           ` Borislav Petkov
@ 2020-11-10 14:34                             ` Thomas Gleixner
  2020-11-10 14:55                               ` Tom Lendacky
  2020-11-10 17:47                               ` [tip: x86/apic] x86/ioapic: Correct the PCI/ISA trigger type selection tip-bot2 for Thomas Gleixner
  0 siblings, 2 replies; 192+ messages in thread
From: Thomas Gleixner @ 2020-11-10 14:34 UTC (permalink / raw)
  To: Borislav Petkov, Tom Lendacky
  Cc: linux-kernel, David Woodhouse, x86, Qian Cai

On Tue, Nov 10 2020 at 07:10, Borislav Petkov wrote:

> On Mon, Nov 09, 2020 at 05:15:03PM -0600, Tom Lendacky wrote:
>> [  105.325371] hpet: Lost 9601 RTC interrupts
>> [  105.485766] hpet: Lost 9600 RTC interrupts
>> [  105.639182] hpet: Lost 9601 RTC interrupts
>> [  105.792155] hpet: Lost 9601 RTC interrupts
>> [  105.947076] hpet: Lost 9601 RTC interrupts
>> [  106.100876] hpet: Lost 9600 RTC interrupts
>> [  106.253444] hpet: Lost 9601 RTC interrupts
>> [  106.406722] hpet: Lost 9601 RTC interrupts
>> 
>> preventing the system from booting. I bisected it to this commit.
>
> I bisected it to the exact same thing too, on an AMD laptop, after seeing what
> you're seeing.

Bah. I'm a moron.

--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -809,9 +809,9 @@ static bool irq_is_level(int idx)
 	case MP_IRQTRIG_DEFAULT:
 		/*
 		 * Conforms to spec, ie. bus-type dependent trigger
-		 * mode. PCI defaults to egde, ISA to level.
+		 * mode. PCI defaults to level, ISA to edge.
 		 */
-		level = test_bit(bus, mp_bus_not_pci);
+		level = !test_bit(bus, mp_bus_not_pci);
 		/* Take EISA into account */
 		return eisa_irq_is_level(idx, bus, level);
 	case MP_IRQTRIG_EDGE:

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-10 14:34                             ` Thomas Gleixner
@ 2020-11-10 14:55                               ` Tom Lendacky
  2020-11-10 15:54                                 ` Thomas Gleixner
  2020-11-10 17:47                               ` [tip: x86/apic] x86/ioapic: Correct the PCI/ISA trigger type selection tip-bot2 for Thomas Gleixner
  1 sibling, 1 reply; 192+ messages in thread
From: Tom Lendacky @ 2020-11-10 14:55 UTC (permalink / raw)
  To: Thomas Gleixner, Borislav Petkov
  Cc: linux-kernel, David Woodhouse, x86, Qian Cai

On 11/10/20 8:34 AM, Thomas Gleixner wrote:
> On Tue, Nov 10 2020 at 07:10, Borislav Petkov wrote:
> 
>> On Mon, Nov 09, 2020 at 05:15:03PM -0600, Tom Lendacky wrote:
>>> [  105.325371] hpet: Lost 9601 RTC interrupts
>>> [  105.485766] hpet: Lost 9600 RTC interrupts
>>> [  105.639182] hpet: Lost 9601 RTC interrupts
>>> [  105.792155] hpet: Lost 9601 RTC interrupts
>>> [  105.947076] hpet: Lost 9601 RTC interrupts
>>> [  106.100876] hpet: Lost 9600 RTC interrupts
>>> [  106.253444] hpet: Lost 9601 RTC interrupts
>>> [  106.406722] hpet: Lost 9601 RTC interrupts
>>>
>>> preventing the system from booting. I bisected it to this commit.
>>
>> I bisected it to the exact same thing too, on an AMD laptop, after seeing what
>> you're seeing.
> 
> Bah. I'm a moron.
> 
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -809,9 +809,9 @@ static bool irq_is_level(int idx)
>  	case MP_IRQTRIG_DEFAULT:
>  		/*
>  		 * Conforms to spec, ie. bus-type dependent trigger
> -		 * mode. PCI defaults to egde, ISA to level.
> +		 * mode. PCI defaults to level, ISA to edge.
>  		 */
> -		level = test_bit(bus, mp_bus_not_pci);
> +		level = !test_bit(bus, mp_bus_not_pci);
>  		/* Take EISA into account */
>  		return eisa_irq_is_level(idx, bus, level);
>  	case MP_IRQTRIG_EDGE:
> 

I was about to send the dmesg output when I saw this. A quick test with
this change resolves the boot issue, thanks!

I'm still seeing the warning at arch/x86/kernel/apic/apic.c:2527, but I'll
start a separate thread on that.

Thanks,
Tom

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-10 14:55                               ` Tom Lendacky
@ 2020-11-10 15:54                                 ` Thomas Gleixner
  2020-11-10 16:01                                   ` [EXTERNAL] " David Woodhouse
  2020-11-10 16:17                                   ` Tom Lendacky
  0 siblings, 2 replies; 192+ messages in thread
From: Thomas Gleixner @ 2020-11-10 15:54 UTC (permalink / raw)
  To: Tom Lendacky, Borislav Petkov
  Cc: linux-kernel, David Woodhouse, x86, Qian Cai

On Tue, Nov 10 2020 at 08:55, Tom Lendacky wrote:
> On 11/10/20 8:34 AM, Thomas Gleixner wrote:
> I was about to send the dmesg output when I saw this. A quick test with
> this change resolves the boot issue, thanks!

/me feels stupid

> I'm still seeing the warning at arch/x86/kernel/apic/apic.c:2527, but I'll
> start a separate thread on that.

Can you please provide the backtrace?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [EXTERNAL] [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-10 15:54                                 ` Thomas Gleixner
@ 2020-11-10 16:01                                   ` David Woodhouse
  2020-11-10 16:17                                   ` Tom Lendacky
  1 sibling, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-11-10 16:01 UTC (permalink / raw)
  To: Thomas Gleixner, Tom Lendacky, Borislav Petkov
  Cc: linux-kernel, x86, Qian Cai, Joerg Roedel

[-- Attachment #1: Type: text/plain, Size: 1226 bytes --]

On Tue, 2020-11-10 at 16:54 +0100, Thomas Gleixner wrote:
> On Tue, Nov 10 2020 at 08:55, Tom Lendacky wrote:
> > On 11/10/20 8:34 AM, Thomas Gleixner wrote:
> > I was about to send the dmesg output when I saw this. A quick test
> > with
> > this change resolves the boot issue, thanks!
> 
> /me feels stupid
> 
> > I'm still seeing the warning at arch/x86/kernel/apic/apic.c:2527,
> > but I'll
> > start a separate thread on that.
> 
> Can you please provide the backtrace?

We'll want something like this, to stop AMD IOMMU from registering its
irqdomain even when it shouldn't. But that's the wrong way round; that
would give you remapped MSIs when it should be using the
x86_vector_domain for them. And you have the converse, it seems.

--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1601,9 +1601,11 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h)
 	if (ret)
 		return ret;
 
-	ret = amd_iommu_create_irq_domain(iommu);
-	if (ret)
-		return ret;
+	if (amd_iommu_irq_remap) {
+		ret = amd_iommu_create_irq_domain(iommu);
+		if (ret)
+			return ret;
+	}
 
 	/*
 	 * Make sure IOMMU is not considered to translate itself. The IVRS


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-10 15:54                                 ` Thomas Gleixner
  2020-11-10 16:01                                   ` [EXTERNAL] " David Woodhouse
@ 2020-11-10 16:17                                   ` Tom Lendacky
  2020-11-10 16:33                                     ` [EXTERNAL] " David Woodhouse
  1 sibling, 1 reply; 192+ messages in thread
From: Tom Lendacky @ 2020-11-10 16:17 UTC (permalink / raw)
  To: Thomas Gleixner, Borislav Petkov
  Cc: linux-kernel, David Woodhouse, x86, Qian Cai

On 11/10/20 9:54 AM, Thomas Gleixner wrote:
> On Tue, Nov 10 2020 at 08:55, Tom Lendacky wrote:
>> On 11/10/20 8:34 AM, Thomas Gleixner wrote:
>> I was about to send the dmesg output when I saw this. A quick test with
>> this change resolves the boot issue, thanks!
> 
> /me feels stupid
> 
>> I'm still seeing the warning at arch/x86/kernel/apic/apic.c:2527, but I'll
>> start a separate thread on that.
> 
> Can you please provide the backtrace?

Yep. The warning started triggering with:
47bea873cf80 ("x86/msi: Only use high bits of MSI address for DMAR unit")

Here's the backtrace:

[   15.611109] ------------[ cut here ]------------
[   15.616274] WARNING: CPU: 184 PID: 1 at arch/x86/kernel/apic/apic.c:2505 __irq_msi_compose_msg+0x79/0x80
[   15.626855] Modules linked in:
[   15.630263] CPU: 184 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc1-sos-custom #1
[   15.638516] Hardware name: AMD Corporation ETHANOL_X/ETHANOL_X, BIOS REX1006G 01/25/2020
[   15.647549] RIP: 0010:__irq_msi_compose_msg+0x79/0x80
[   15.653188] Code: 0f f0 ff 09 d0 89 06 8b 47 04 c7 46 04 00 00 00 00 88 46 08 45 84 c0 74 08 8b 07 30 c0 89 46 04 c3 81 3f ff 00 00 00 77 01 c3 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 8b 05 81 f9 04 02 85 c0 75 16
[   15.674140] RSP: 0018:ffffc900000c7c30 EFLAGS: 00010212
[   15.679971] RAX: 0000000000000021 RBX: ffff888143789028 RCX: 0000000000000000
[   15.687936] RDX: 0000000000000000 RSI: ffffc900000c7c40 RDI: ffff8881447dd1c0
[   15.695899] RBP: ffff888143789028 R08: 0000000000000000 R09: 0000000000000005
[   15.703861] R10: 0000000000000002 R11: 0000000000000004 R12: ffff88900e7b80c0
[   15.711825] R13: 0000000000000000 R14: ffff88907646ba80 R15: 0000000000000000
[   15.719789] FS:  0000000000000000(0000) GS:ffff88900f000000(0000) knlGS:0000000000000000
[   15.728821] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   15.735234] CR2: 0000000000000000 CR3: 000000114d40a000 CR4: 0000000000350ee0
[   15.743197] Call Trace:
[   15.745929]  irq_chip_compose_msi_msg+0x2e/0x40
[   15.750984]  msi_domain_activate+0x4b/0x90
[   15.755556]  __irq_domain_activate_irq+0x53/0x80
[   15.760707]  ? irq_set_msi_desc_off+0x5a/0x90
[   15.765568]  irq_domain_activate_irq+0x25/0x40
[   15.770525]  __msi_domain_alloc_irqs+0x16a/0x310
[   15.775680]  __pci_enable_msi_range+0x182/0x2b0
[   15.780738]  ? e820__memblock_setup+0x7d/0x7d
[   15.785597]  pci_enable_msi+0x16/0x30
[   15.789685]  iommu_init_msi+0x30/0x190
[   15.793860]  state_next+0x39d/0x665
[   15.797754]  ? e820__memblock_setup+0x7d/0x7d
[   15.802615]  iommu_go_to_state+0x24/0x28
[   15.806993]  amd_iommu_init+0x11/0x46
[   15.811076]  pci_iommu_init+0x16/0x3f
[   15.815165]  do_one_initcall+0x44/0x1d0
[   15.819447]  kernel_init_freeable+0x1e7/0x249
[   15.824311]  ? rest_init+0xb4/0xb4
[   15.828097]  kernel_init+0xa/0x10c
[   15.831889]  ret_from_fork+0x22/0x30
[   15.835883] CPU: 184 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc1-sos-custom #1
[   15.836876] Hardware name: AMD Corporation ETHANOL_X/ETHANOL_X, BIOS REX1006G 01/25/2020
[   15.836876] Call Trace:
[   15.836876]  dump_stack+0x6d/0x88
[   15.836876]  __warn.cold+0x24/0x3d
[   15.836876]  ? __irq_msi_compose_msg+0x79/0x80
[   15.836876]  report_bug+0xd1/0x100
[   15.836876]  handle_bug+0x35/0x80
[   15.836876]  exc_invalid_op+0x14/0x70
[   15.836876]  asm_exc_invalid_op+0x12/0x20

[   15.836876] RIP: 0010:__irq_msi_compose_msg+0x79/0x80
[   15.836876] Code: 0f f0 ff 09 d0 89 06 8b 47 04 c7 46 04 00 00 00 00 88 46 08 45 84 c0 74 08 8b 07 30 c0 89 46 04 c3 81 3f ff 00 00 00 77 01 c3 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 8b 05 81 f9 04 02 85 c0 75 16
[   15.836876] RSP: 0018:ffffc900000c7c30 EFLAGS: 00010212
[   15.836876] RAX: 0000000000000021 RBX: ffff888143789028 RCX: 0000000000000000
[   15.836876] RDX: 0000000000000000 RSI: ffffc900000c7c40 RDI: ffff8881447dd1c0
[   15.836876] RBP: ffff888143789028 R08: 0000000000000000 R09: 0000000000000005
[   15.836876] R10: 0000000000000002 R11: 0000000000000004 R12: ffff88900e7b80c0
[   15.836876] R13: 0000000000000000 R14: ffff88907646ba80 R15: 0000000000000000
[   15.836876]  irq_chip_compose_msi_msg+0x2e/0x40
[   15.836876]  msi_domain_activate+0x4b/0x90
[   15.836876]  __irq_domain_activate_irq+0x53/0x80
[   15.836876]  ? irq_set_msi_desc_off+0x5a/0x90
[   15.836876]  irq_domain_activate_irq+0x25/0x40
[   15.836876]  __msi_domain_alloc_irqs+0x16a/0x310
[   15.836876]  __pci_enable_msi_range+0x182/0x2b0
[   15.836876]  ? e820__memblock_setup+0x7d/0x7d
[   15.836876]  pci_enable_msi+0x16/0x30
[   15.836876]  iommu_init_msi+0x30/0x190
[   15.836876]  state_next+0x39d/0x665
[   15.836876]  ? e820__memblock_setup+0x7d/0x7d
[   15.836876]  iommu_go_to_state+0x24/0x28
[   15.836876]  amd_iommu_init+0x11/0x46
[   15.836876]  pci_iommu_init+0x16/0x3f
[   15.836876]  do_one_initcall+0x44/0x1d0
[   15.836876]  kernel_init_freeable+0x1e7/0x249
[   15.836876]  ? rest_init+0xb4/0xb4
[   15.836876]  kernel_init+0xa/0x10c
[   15.836876]  ret_from_fork+0x22/0x30
[   16.046524] ---[ end trace 2fb272e5ca8df1c8 ]---

Thanks,
Tom

> 
> Thanks,
> 
>         tglx
> 

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 19/35] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-10  8:59                         ` David Woodhouse
@ 2020-11-10 16:26                           ` Paolo Bonzini
  0 siblings, 0 replies; 192+ messages in thread
From: Paolo Bonzini @ 2020-11-10 16:26 UTC (permalink / raw)
  To: David Woodhouse, Qian Cai, x86, Thomas Gleixner, Tom Murphy
  Cc: kvm, iommu, joro, linux-kernel, linux-hyperv, maz, Dexuan Cui,
	Stephen Rothwell, Linux Next Mailing List

On 10/11/20 09:59, David Woodhouse wrote:
> Hm, attempting to reproduce this shows something else. Ever since
> commit be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma-
> iommu api") in 5.5 the following stops working for me:
> 
> $ qemu-system-x86_64 -serial mon:stdio -kernel bzImage  -machine q35,accel=kvm,kernel-irqchip=split -m 2G -device amd-iommu,intremap=off -append "console=ttyS0 apic=verbose debug" -display none
> 
> It hasn't got a hard drive but I can watch the SATA interrupts fail as
> it probes the CD-ROM:
> 
> [    7.403327] ata3.00: qc timeout (cmd 0xa1)
> [    7.405980] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> 
> Adding 'iommu=off' to the kernel command line makes it work again, in
> that it correctly panics at the lack of a root file system, quickly.

That might well be a QEMU bug though, AMD emulation is kinda experimental.

Paolo

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [EXTERNAL] [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-10 16:17                                   ` Tom Lendacky
@ 2020-11-10 16:33                                     ` David Woodhouse
  2020-11-10 17:04                                       ` Tom Lendacky
  2020-11-10 17:50                                       ` Thomas Gleixner
  0 siblings, 2 replies; 192+ messages in thread
From: David Woodhouse @ 2020-11-10 16:33 UTC (permalink / raw)
  To: Tom Lendacky, Thomas Gleixner, Borislav Petkov
  Cc: linux-kernel, x86, Qian Cai, Joerg Roedel

[-- Attachment #1: Type: text/plain, Size: 3017 bytes --]

On Tue, 2020-11-10 at 10:17 -0600, Tom Lendacky wrote:
> Yep. The warning started triggering with:
> 47bea873cf80 ("x86/msi: Only use high bits of MSI address for DMAR unit")
> 
> Here's the backtrace:
> 
> [   15.611109] ------------[ cut here ]------------
> [   15.616274] WARNING: CPU: 184 PID: 1 at arch/x86/kernel/apic/apic.c:2505 __irq_msi_compose_msg+0x79/0x80
> [   15.626855] Modules linked in:
> [   15.630263] CPU: 184 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc1-sos-custom #1
> [   15.638516] Hardware name: AMD Corporation ETHANOL_X/ETHANOL_X, BIOS REX1006G 01/25/2020
> [   15.647549] RIP: 0010:__irq_msi_compose_msg+0x79/0x80
> [   15.653188] Code: 0f f0 ff 09 d0 89 06 8b 47 04 c7 46 04 00 00 00 00 88 46 08 45 84 c0 74 08 8b 07 30 c0 89 46 04 c3 81 3f ff 00 00 00 77 01 c3 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 8b 05 81 f9 04 02 85 c0 75 16
> [   15.674140] RSP: 0018:ffffc900000c7c30 EFLAGS: 00010212
> [   15.679971] RAX: 0000000000000021 RBX: ffff888143789028 RCX: 0000000000000000
> [   15.687936] RDX: 0000000000000000 RSI: ffffc900000c7c40 RDI: ffff8881447dd1c0
> [   15.695899] RBP: ffff888143789028 R08: 0000000000000000 R09: 0000000000000005
> [   15.703861] R10: 0000000000000002 R11: 0000000000000004 R12: ffff88900e7b80c0
> [   15.711825] R13: 0000000000000000 R14: ffff88907646ba80 R15: 0000000000000000
> [   15.719789] FS:  0000000000000000(0000) GS:ffff88900f000000(0000) knlGS:0000000000000000
> [   15.728821] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   15.735234] CR2: 0000000000000000 CR3: 000000114d40a000 CR4: 0000000000350ee0
> [   15.743197] Call Trace:
> [   15.745929]  irq_chip_compose_msi_msg+0x2e/0x40
> [   15.750984]  msi_domain_activate+0x4b/0x90
> [   15.755556]  __irq_domain_activate_irq+0x53/0x80
> [   15.760707]  ? irq_set_msi_desc_off+0x5a/0x90
> [   15.765568]  irq_domain_activate_irq+0x25/0x40
> [   15.770525]  __msi_domain_alloc_irqs+0x16a/0x310
> [   15.775680]  __pci_enable_msi_range+0x182/0x2b0
> [   15.780738]  ? e820__memblock_setup+0x7d/0x7d
> [   15.785597]  pci_enable_msi+0x16/0x30
> [   15.789685]  iommu_init_msi+0x30/0x190
> [   15.793860]  state_next+0x39d/0x665
> [   15.797754]  ? e820__memblock_setup+0x7d/0x7d
> [   15.802615]  iommu_go_to_state+0x24/0x28
> [   15.806993]  amd_iommu_init+0x11/0x46
> [   15.811076]  pci_iommu_init+0x16/0x3f

Oh joy.

It's asking the core code to generate a PCI MSI message for it and
actually program that to the PCI device (since the IOMMU itself is a
PCI device).

That isn't actually used for generating MSI, but is instead interpreted
to write the intcapxt registers which *do* generate the interrupts.

That wants fixing, preferably not to go via MSI format at all, or maybe
just to use the 'dmar' flag to __irq_msi_compose_msg(). Either way by
having an irqdomain of its own like the Intel IOMMU does.

If I could get post-5.5 kernels to boot at all with the AMD IOMMU
enabled, I'd have a go at throwing that together now...


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [EXTERNAL] [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-10 16:33                                     ` [EXTERNAL] " David Woodhouse
@ 2020-11-10 17:04                                       ` Tom Lendacky
  2020-11-10 17:50                                       ` Thomas Gleixner
  1 sibling, 0 replies; 192+ messages in thread
From: Tom Lendacky @ 2020-11-10 17:04 UTC (permalink / raw)
  To: David Woodhouse, Thomas Gleixner, Borislav Petkov
  Cc: linux-kernel, x86, Qian Cai, Joerg Roedel

On 11/10/20 10:33 AM, David Woodhouse wrote:
> On Tue, 2020-11-10 at 10:17 -0600, Tom Lendacky wrote:
>> Yep. The warning started triggering with:
>> 47bea873cf80 ("x86/msi: Only use high bits of MSI address for DMAR unit")
>>
>> Here's the backtrace:
>>
>> [   15.611109] ------------[ cut here ]------------
>> [   15.616274] WARNING: CPU: 184 PID: 1 at arch/x86/kernel/apic/apic.c:2505 __irq_msi_compose_msg+0x79/0x80
>> [   15.626855] Modules linked in:
>> [   15.630263] CPU: 184 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc1-sos-custom #1
>> [   15.638516] Hardware name: AMD Corporation ETHANOL_X/ETHANOL_X, BIOS REX1006G 01/25/2020
>> [   15.647549] RIP: 0010:__irq_msi_compose_msg+0x79/0x80
>> [   15.653188] Code: 0f f0 ff 09 d0 89 06 8b 47 04 c7 46 04 00 00 00 00 88 46 08 45 84 c0 74 08 8b 07 30 c0 89 46 04 c3 81 3f ff 00 00 00 77 01 c3 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 8b 05 81 f9 04 02 85 c0 75 16
>> [   15.674140] RSP: 0018:ffffc900000c7c30 EFLAGS: 00010212
>> [   15.679971] RAX: 0000000000000021 RBX: ffff888143789028 RCX: 0000000000000000
>> [   15.687936] RDX: 0000000000000000 RSI: ffffc900000c7c40 RDI: ffff8881447dd1c0
>> [   15.695899] RBP: ffff888143789028 R08: 0000000000000000 R09: 0000000000000005
>> [   15.703861] R10: 0000000000000002 R11: 0000000000000004 R12: ffff88900e7b80c0
>> [   15.711825] R13: 0000000000000000 R14: ffff88907646ba80 R15: 0000000000000000
>> [   15.719789] FS:  0000000000000000(0000) GS:ffff88900f000000(0000) knlGS:0000000000000000
>> [   15.728821] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   15.735234] CR2: 0000000000000000 CR3: 000000114d40a000 CR4: 0000000000350ee0
>> [   15.743197] Call Trace:
>> [   15.745929]  irq_chip_compose_msi_msg+0x2e/0x40
>> [   15.750984]  msi_domain_activate+0x4b/0x90
>> [   15.755556]  __irq_domain_activate_irq+0x53/0x80
>> [   15.760707]  ? irq_set_msi_desc_off+0x5a/0x90
>> [   15.765568]  irq_domain_activate_irq+0x25/0x40
>> [   15.770525]  __msi_domain_alloc_irqs+0x16a/0x310
>> [   15.775680]  __pci_enable_msi_range+0x182/0x2b0
>> [   15.780738]  ? e820__memblock_setup+0x7d/0x7d
>> [   15.785597]  pci_enable_msi+0x16/0x30
>> [   15.789685]  iommu_init_msi+0x30/0x190
>> [   15.793860]  state_next+0x39d/0x665
>> [   15.797754]  ? e820__memblock_setup+0x7d/0x7d
>> [   15.802615]  iommu_go_to_state+0x24/0x28
>> [   15.806993]  amd_iommu_init+0x11/0x46
>> [   15.811076]  pci_iommu_init+0x16/0x3f
> 
> Oh joy.
> 
> It's asking the core code to generate a PCI MSI message for it and
> actually program that to the PCI device (since the IOMMU itself is a
> PCI device).
> 
> That isn't actually used for generating MSI, but is instead interpreted
> to write the intcapxt registers which *do* generate the interrupts.
> 
> That wants fixing, preferably not to go via MSI format at all, or maybe
> just to use the 'dmar' flag to __irq_msi_compose_msg(). Either way by
> having an irqdomain of its own like the Intel IOMMU does.
> 
> If I could get post-5.5 kernels to boot at all with the AMD IOMMU
> enabled, I'd have a go at throwing that together now...

That's odd. I haven't had an issue with or heard of any issues with
booting post-5.5 kernels with the AMD IOMMU enabled. What kind of problems
are you encountering?

Thanks,
Tom

> 

^ permalink raw reply	[flat|nested] 192+ messages in thread

* [tip: x86/apic] x86/ioapic: Correct the PCI/ISA trigger type selection
  2020-11-10 14:34                             ` Thomas Gleixner
  2020-11-10 14:55                               ` Tom Lendacky
@ 2020-11-10 17:47                               ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 192+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-11-10 17:47 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Tom Lendacky, Borislav Petkov, Qian Cai, Thomas Gleixner,
	David Woodhouse, x86, linux-kernel

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     aec8da04e4d71afdd4ab3025ea34a6517435f363
Gitweb:        https://git.kernel.org/tip/aec8da04e4d71afdd4ab3025ea34a6517435f363
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Tue, 10 Nov 2020 15:34:32 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Tue, 10 Nov 2020 18:43:22 +01:00

x86/ioapic: Correct the PCI/ISA trigger type selection

PCI's default trigger type is level and ISA's is edge. The recent
refactoring made it the other way round, which went unnoticed as it seems
only to cause havoc on some AMD systems.

Make the comment and code do the right thing again.

Fixes: a27dca645d2c ("x86/io_apic: Cleanup trigger/polarity helpers")
Reported-by: Tom Lendacky <thomas.lendacky@amd.com>
Reported-by: Borislav Petkov <bp@alien8.de>
Reported-by: Qian Cai <cai@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/87d00lgu13.fsf@nanos.tec.linutronix.de
---
 arch/x86/kernel/apic/io_apic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 0602c95..089e755 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -809,9 +809,9 @@ static bool irq_is_level(int idx)
 	case MP_IRQTRIG_DEFAULT:
 		/*
 		 * Conforms to spec, ie. bus-type dependent trigger
-		 * mode. PCI defaults to egde, ISA to level.
+		 * mode. PCI defaults to level, ISA to edge.
 		 */
-		level = test_bit(bus, mp_bus_not_pci);
+		level = !test_bit(bus, mp_bus_not_pci);
 		/* Take EISA into account */
 		return eisa_irq_is_level(idx, bus, level);
 	case MP_IRQTRIG_EDGE:

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* Re: [EXTERNAL] [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-10 16:33                                     ` [EXTERNAL] " David Woodhouse
  2020-11-10 17:04                                       ` Tom Lendacky
@ 2020-11-10 17:50                                       ` Thomas Gleixner
  2020-11-10 18:56                                         ` Thomas Gleixner
  1 sibling, 1 reply; 192+ messages in thread
From: Thomas Gleixner @ 2020-11-10 17:50 UTC (permalink / raw)
  To: David Woodhouse, Tom Lendacky, Borislav Petkov
  Cc: linux-kernel, x86, Qian Cai, Joerg Roedel

On Tue, Nov 10 2020 at 16:33, David Woodhouse wrote:
> On Tue, 2020-11-10 at 10:17 -0600, Tom Lendacky wrote:
>> Yep. The warning started triggering with:
>> 47bea873cf80 ("x86/msi: Only use high bits of MSI address for DMAR unit")
>> 
>> Here's the backtrace:
>> 
>> [   15.745929]  irq_chip_compose_msi_msg+0x2e/0x40
>> [   15.750984]  msi_domain_activate+0x4b/0x90
>> [   15.755556]  __irq_domain_activate_irq+0x53/0x80
>> [   15.760707]  ? irq_set_msi_desc_off+0x5a/0x90
>> [   15.765568]  irq_domain_activate_irq+0x25/0x40
>> [   15.770525]  __msi_domain_alloc_irqs+0x16a/0x310
>> [   15.775680]  __pci_enable_msi_range+0x182/0x2b0
>> [   15.780738]  ? e820__memblock_setup+0x7d/0x7d
>> [   15.785597]  pci_enable_msi+0x16/0x30
>> [   15.789685]  iommu_init_msi+0x30/0x190
>
> It's asking the core code to generate a PCI MSI message for it and
> actually program that to the PCI device (since the IOMMU itself is a
> PCI device).
>
> That isn't actually used for generating MSI, but is instead interpreted
> to write the intcapxt registers which *do* generate the interrupts.
>
> That wants fixing, preferably not to go via MSI format at all, or maybe
> just to use the 'dmar' flag to __irq_msi_compose_msg(). Either way by
> having an irqdomain of its own like the Intel IOMMU does.
>
> If I could get post-5.5 kernels to boot at all with the AMD IOMMU
> enabled, I'd have a go at throwing that together now...

It can share the dmar domain code. Let me frob something.

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [EXTERNAL] [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-10 17:50                                       ` Thomas Gleixner
@ 2020-11-10 18:56                                         ` Thomas Gleixner
  2020-11-10 19:21                                           ` David Woodhouse
  0 siblings, 1 reply; 192+ messages in thread
From: Thomas Gleixner @ 2020-11-10 18:56 UTC (permalink / raw)
  To: David Woodhouse, Tom Lendacky, Borislav Petkov
  Cc: linux-kernel, x86, Qian Cai, Joerg Roedel

On Tue, Nov 10 2020 at 18:50, Thomas Gleixner wrote:
> On Tue, Nov 10 2020 at 16:33, David Woodhouse wrote:
>> If I could get post-5.5 kernels to boot at all with the AMD IOMMU
>> enabled, I'd have a go at throwing that together now...
>
> It can share the dmar domain code. Let me frob something.

Not much to share there and I can't access my AMD machine at the
moment. Something like the untested below should work.

Thanks,

        tglx
---
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -523,7 +523,7 @@ struct msi_msg;
 struct irq_cfg;
 
 extern void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
-				  bool dmar);
+				  bool iommu);
 
 extern void ioapic_zap_locks(void);
 
--- a/arch/x86/include/asm/irqdomain.h
+++ b/arch/x86/include/asm/irqdomain.h
@@ -63,4 +63,6 @@ static inline void x86_create_pci_msi_do
 #define x86_pci_msi_default_domain	NULL
 #endif
 
+struct irq_domain *x86_create_iommu_msi_domain(void);
+
 #endif
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2498,7 +2498,7 @@ int hard_smp_processor_id(void)
 }
 
 void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
-			   bool dmar)
+			   bool iommu)
 {
 	memset(msg, 0, sizeof(*msg));
 
@@ -2519,7 +2519,7 @@ void __irq_msi_compose_msg(struct irq_cf
 	 * some hypervisors allow the extended destination ID field in bits
 	 * 5-11 to be used, giving support for 15 bits of APIC IDs in total.
 	 */
-	if (dmar)
+	if (iommu)
 		msg->arch_addr_hi.destid_8_31 = cfg->dest_apicid >> 8;
 	else if (virt_ext_dest_id && cfg->dest_apicid < 0x8000)
 		msg->arch_addr_lo.virt_destid_8_14 = cfg->dest_apicid >> 8;
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -184,7 +184,8 @@ static struct msi_domain_info pci_msi_do
 	.handler_name	= "edge",
 };
 
-struct irq_domain * __init native_create_pci_msi_domain(void)
+static struct irq_domain *__init
+__create_pci_msi_domain(struct msi_domain_info *info, const char *name)
 {
 	struct fwnode_handle *fn;
 	struct irq_domain *d;
@@ -192,21 +193,25 @@ struct irq_domain * __init native_create
 	if (disable_apic)
 		return NULL;
 
-	fn = irq_domain_alloc_named_fwnode("PCI-MSI");
+	fn = irq_domain_alloc_named_fwnode(name);
 	if (!fn)
 		return NULL;
 
-	d = pci_msi_create_irq_domain(fn, &pci_msi_domain_info,
-				      x86_vector_domain);
+	d = pci_msi_create_irq_domain(fn, info, x86_vector_domain);
 	if (!d) {
 		irq_domain_free_fwnode(fn);
-		pr_warn("Failed to initialize PCI-MSI irqdomain.\n");
+		pr_warn("Failed to initialize %s irqdomain.\n", name);
 	} else {
 		d->flags |= IRQ_DOMAIN_MSI_NOMASK_QUIRK;
 	}
 	return d;
 }
 
+struct irq_domain * __init native_create_pci_msi_domain(void)
+{
+	return __create_pci_msi_domain(&pci_msi_domain_info, "PCI-MSI");
+}
+
 void __init x86_create_pci_msi_domain(void)
 {
 	x86_pci_msi_default_domain = x86_init.irqs.create_pci_msi_domain();
@@ -247,6 +252,43 @@ struct irq_domain *arch_create_remap_msi
 }
 #endif
 
+static void __maybe_unused iommu_msi_compose_msg(struct irq_data *data,
+						 struct msi_msg *msg)
+{
+	__irq_msi_compose_msg(irqd_cfg(data), msg, true);
+}
+
+#ifdef CONFIG_AMD_IOMMU
+/*
+ * Similar to the Intel IOMMU abuse below. The resulting irq domain is
+ * associated to the IOMMU pci device, so that pci_enable_msi() works.
+ */
+static struct irq_chip iommu_msi_controller = {
+	.name			= "IOMMU-MSI",
+	.irq_unmask		= pci_msi_unmask_irq,
+	.irq_mask		= pci_msi_mask_irq,
+	.irq_ack		= irq_chip_ack_parent,
+	.irq_retrigger		= irq_chip_retrigger_hierarchy,
+	.irq_set_affinity	= msi_set_affinity,
+	.irq_compose_msi_msg	= iommu_msi_compose_msg,
+	.flags			= IRQCHIP_SKIP_SET_WAKE,
+};
+
+static struct msi_domain_info iommu_msi_domain_info = {
+	.flags		= MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
+			  MSI_FLAG_PCI_MSIX,
+	.ops		= &pci_msi_domain_ops,
+	.chip		= &iommu_msi_controller,
+	.handler	= handle_edge_irq,
+	.handler_name	= "edge",
+};
+
+struct irq_domain __init *x86_create_iommu_msi_domain(void)
+{
+	return __create_pci_msi_domain(&iommu_msi_domain_info, "IOMMU-MSI");
+}
+#endif
+
 #ifdef CONFIG_DMAR_TABLE
 /*
  * The Intel IOMMU (ab)uses the high bits of the MSI address to contain the
@@ -254,11 +296,6 @@ struct irq_domain *arch_create_remap_msi
  * case for MSIs as it would be targeting real memory above 4GiB not the
  * APIC.
  */
-static void dmar_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
-{
-	__irq_msi_compose_msg(irqd_cfg(data), msg, true);
-}
-
 static void dmar_msi_write_msg(struct irq_data *data, struct msi_msg *msg)
 {
 	dmar_msi_write(data->irq, msg);
@@ -271,7 +308,7 @@ static struct irq_chip dmar_msi_controll
 	.irq_ack		= irq_chip_ack_parent,
 	.irq_set_affinity	= msi_domain_set_affinity,
 	.irq_retrigger		= irq_chip_retrigger_hierarchy,
-	.irq_compose_msi_msg	= dmar_msi_compose_msg,
+	.irq_compose_msi_msg	= iommu_msi_compose_msg,
 	.irq_write_msi_msg	= dmar_msi_write_msg,
 	.flags			= IRQCHIP_SKIP_SET_WAKE,
 };
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1759,6 +1759,7 @@ static const struct attribute_group *amd
 
 static int __init iommu_init_pci(struct amd_iommu *iommu)
 {
+	static struct irq_domain *msidom;
 	int cap_ptr = iommu->cap_ptr;
 	int ret;
 
@@ -1767,6 +1768,13 @@ static int __init iommu_init_pci(struct
 	if (!iommu->dev)
 		return -ENODEV;
 
+	if (!msidom) {
+		msidom = x86_create_iommu_msi_domain();
+		if (!msidom)
+			return -ENODEV;
+	}
+	dev_set_msi_domain(&iommu->dev->dev, msidom);
+
 	/* Prevent binding other PCI device drivers to IOMMU devices */
 	iommu->dev->match_driver = false;
 

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [EXTERNAL] [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-10 18:56                                         ` Thomas Gleixner
@ 2020-11-10 19:21                                           ` David Woodhouse
  2020-11-10 21:01                                             ` Thomas Gleixner
  0 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-11-10 19:21 UTC (permalink / raw)
  To: Thomas Gleixner, Tom Lendacky, Borislav Petkov
  Cc: linux-kernel, x86, Qian Cai, Joerg Roedel



On 10 November 2020 18:56:17 GMT, Thomas Gleixner <tglx@linutronix.de> wrote:
>On Tue, Nov 10 2020 at 18:50, Thomas Gleixner wrote:
>> On Tue, Nov 10 2020 at 16:33, David Woodhouse wrote:
>>> If I could get post-5.5 kernels to boot at all with the AMD IOMMU
>>> enabled, I'd have a go at throwing that together now...
>>
>> It can share the dmar domain code. Let me frob something.
>
>Not much to share there and I can't access my AMD machine at the
>moment. Something like the untested below should work.

Does it even need its own irqdomain? Can it not just allocate directly from the vector domain then program its own register directly based on the irq_cfg?


-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [EXTERNAL] [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-10 19:21                                           ` David Woodhouse
@ 2020-11-10 21:01                                             ` Thomas Gleixner
  2020-11-10 21:30                                               ` David Woodhouse
  0 siblings, 1 reply; 192+ messages in thread
From: Thomas Gleixner @ 2020-11-10 21:01 UTC (permalink / raw)
  To: David Woodhouse, Tom Lendacky, Borislav Petkov
  Cc: linux-kernel, x86, Qian Cai, Joerg Roedel

On Tue, Nov 10 2020 at 19:21, David Woodhouse wrote:

> On 10 November 2020 18:56:17 GMT, Thomas Gleixner <tglx@linutronix.de> wrote:
>>On Tue, Nov 10 2020 at 18:50, Thomas Gleixner wrote:
>>> On Tue, Nov 10 2020 at 16:33, David Woodhouse wrote:
>>>> If I could get post-5.5 kernels to boot at all with the AMD IOMMU
>>>> enabled, I'd have a go at throwing that together now...
>>>
>>> It can share the dmar domain code. Let me frob something.
>>
>>Not much to share there and I can't access my AMD machine at the
>>moment. Something like the untested below should work.
>
> Does it even need its own irqdomain? Can it not just allocate directly
> from the vector domain then program its own register directly based on
> the irq_cfg?

It uses pci_enable_msi() and I have no clue about that piece of hardware
and whether that is actually required or not. If it is, then it needs a
domain because that's what pci_enable_msi() uses.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [EXTERNAL] [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-10 21:01                                             ` Thomas Gleixner
@ 2020-11-10 21:30                                               ` David Woodhouse
  2020-11-10 22:00                                                 ` Tom Lendacky
  0 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-11-10 21:30 UTC (permalink / raw)
  To: Thomas Gleixner, Tom Lendacky, Borislav Petkov
  Cc: linux-kernel, x86, Qian Cai, Joerg Roedel



On 10 November 2020 21:01:17 GMT, Thomas Gleixner <tglx@linutronix.de> wrote:
>On Tue, Nov 10 2020 at 19:21, David Woodhouse wrote:
>
>> On 10 November 2020 18:56:17 GMT, Thomas Gleixner
><tglx@linutronix.de> wrote:
>>>On Tue, Nov 10 2020 at 18:50, Thomas Gleixner wrote:
>>>> On Tue, Nov 10 2020 at 16:33, David Woodhouse wrote:
>>>>> If I could get post-5.5 kernels to boot at all with the AMD IOMMU
>>>>> enabled, I'd have a go at throwing that together now...
>>>>
>>>> It can share the dmar domain code. Let me frob something.
>>>
>>>Not much to share there and I can't access my AMD machine at the
>>>moment. Something like the untested below should work.
>>
>> Does it even need its own irqdomain? Can it not just allocate
>directly
>> from the vector domain then program its own register directly based
>on
>> the irq_cfg?
>
>It uses pci_enable_msi() and I have no clue about that piece of
>hardware
>and whether that is actually required or not. If it is, then it needs a
>domain because that's what pci_enable_msi() uses.

I'd be kind of surprised if it is required, but testing on qemu is obviously not going to cut it. Tom?

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [EXTERNAL] [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-10 21:30                                               ` David Woodhouse
@ 2020-11-10 22:00                                                 ` Tom Lendacky
  2020-11-10 22:48                                                   ` Thomas Gleixner
  0 siblings, 1 reply; 192+ messages in thread
From: Tom Lendacky @ 2020-11-10 22:00 UTC (permalink / raw)
  To: David Woodhouse, Thomas Gleixner, Borislav Petkov
  Cc: linux-kernel, x86, Qian Cai, Joerg Roedel

On 11/10/20 3:30 PM, David Woodhouse wrote:
> 
> 
> On 10 November 2020 21:01:17 GMT, Thomas Gleixner <tglx@linutronix.de> wrote:
>> On Tue, Nov 10 2020 at 19:21, David Woodhouse wrote:
>>
>>> On 10 November 2020 18:56:17 GMT, Thomas Gleixner
>> <tglx@linutronix.de> wrote:
>>>> On Tue, Nov 10 2020 at 18:50, Thomas Gleixner wrote:
>>>>> On Tue, Nov 10 2020 at 16:33, David Woodhouse wrote:
>>>>>> If I could get post-5.5 kernels to boot at all with the AMD IOMMU
>>>>>> enabled, I'd have a go at throwing that together now...
>>>>>
>>>>> It can share the dmar domain code. Let me frob something.
>>>>
>>>> Not much to share there and I can't access my AMD machine at the
>>>> moment. Something like the untested below should work.
>>>
>>> Does it even need its own irqdomain? Can it not just allocate
>> directly
>>> from the vector domain then program its own register directly based
>> on
>>> the irq_cfg?
>>
>> It uses pci_enable_msi() and I have no clue about that piece of
>> hardware
>> and whether that is actually required or not. If it is, then it needs a
>> domain because that's what pci_enable_msi() uses.
> 
> I'd be kind of surprised if it is required, but testing on qemu is obviously not going to cut it. Tom?

Was just in the process of testing it... I still get a warning. Here's
the new backtrace:

[   15.581115] WARNING: CPU: 6 PID: 1 at arch/x86/kernel/apic/apic.c:2527 __irq_msi_compose_msg+0x9f/0xb0
[   15.581115] Modules linked in:
[   15.581115] CPU: 6 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc3-sos-custom #1
[   15.581115] Hardware name: AMD Corporation ETHANOL_X/ETHANOL_X, BIOS REX1006G 01/25/2020
[   15.581115] RIP: 0010:__irq_msi_compose_msg+0x9f/0xb0
[   15.581115] Code: 01 00 74 1e 3d ff 7f 00 00 77 1f 0f b7 16 c1 e8 08 83 e0 7f c1 e0 05 66 81 e2 1f f0 09 d0 66 89 06 c3 3d ff 00 00 00 77 01 c3 <0f> 0b c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00
[   15.581115] RSP: 0018:ffffc900000c7c18 EFLAGS: 00010012
[   15.581115] RAX: 0000000000000100 RBX: 0000000000000000 RCX: 0000000000000000
[   15.581115] RDX: 0000000000000000 RSI: ffffc900000c7c20 RDI: ffff8881088341c0
[   15.581115] RBP: ffff8881599e0428 R08: 0000000000000000 R09: ffffffffffffffff
[   15.581115] R10: 0000000000000003 R11: 0000000000000004 R12: ffff8881088341c0
[   15.581115] R13: 0000000000000000 R14: 0000000000000004 R15: ffff888108834180
[   15.581115] FS:  0000000000000000(0000) GS:ffff88900d380000(0000) knlGS:0000000000000000
[   15.581115] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   15.581115] CR2: 0000000000000000 CR3: 000000015240a000 CR4: 0000000000350ee0
[   15.581115] Call Trace:
[   15.581115]  irq_msi_update_msg+0x4d/0x80
[   15.581115]  msi_set_affinity+0x160/0x190
[   15.581115]  irq_do_set_affinity+0x52/0x190
[   15.581115]  irq_setup_affinity+0xd7/0x170
[   15.581115]  irq_startup+0x5d/0xf0
[   15.581115]  __setup_irq+0x6b9/0x700
[   15.581115]  request_threaded_irq+0xf8/0x160
[   15.581115]  ? irq_remapping_alloc+0x4d0/0x4d0
[   15.581115]  ? e820__memblock_setup+0x7d/0x7d
[   15.581115]  iommu_init_msi+0x60/0x190
[   15.581115]  state_next+0x39d/0x665
[   15.581115]  ? e820__memblock_setup+0x7d/0x7d
[   15.581115]  iommu_go_to_state+0x24/0x28
[   15.581115]  amd_iommu_init+0x11/0x46
[   15.581115]  pci_iommu_init+0x16/0x3f
[   15.581115]  do_one_initcall+0x44/0x1d0
[   15.581115]  kernel_init_freeable+0x1e7/0x249
[   15.581115]  ? rest_init+0xb4/0xb4
[   15.581115]  kernel_init+0xa/0x10c
[   15.581115]  ret_from_fork+0x22/0x30
[   15.581115] CPU: 6 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc3-sos-custom #1
[   15.581115] Hardware name: AMD Corporation ETHANOL_X/ETHANOL_X, BIOS REX1006G 01/25/2020
[   15.581115] Call Trace:
[   15.581115]  dump_stack+0x6d/0x88
[   15.581115]  __warn.cold+0x24/0x3d
[   15.581115]  ? __irq_msi_compose_msg+0x9f/0xb0
[   15.581115]  report_bug+0xd1/0x100
[   15.581115]  handle_bug+0x35/0x80
[   15.581115]  exc_invalid_op+0x14/0x70
[   15.581115]  asm_exc_invalid_op+0x12/0x20
[   15.581115] RIP: 0010:__irq_msi_compose_msg+0x9f/0xb0
[   15.581115] Code: 01 00 74 1e 3d ff 7f 00 00 77 1f 0f b7 16 c1 e8 08 83 e0 7f c1 e0 05 66 81 e2 1f f0 09 d0 66 89 06 c3 3d ff 00 00 00 77 01 c3 <0f> 0b c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00
[   15.581115] RSP: 0018:ffffc900000c7c18 EFLAGS: 00010012
[   15.581115] RAX: 0000000000000100 RBX: 0000000000000000 RCX: 0000000000000000
[   15.581115] RDX: 0000000000000000 RSI: ffffc900000c7c20 RDI: ffff8881088341c0
[   15.581115] RBP: ffff8881599e0428 R08: 0000000000000000 R09: ffffffffffffffff
[   15.581115] R10: 0000000000000003 R11: 0000000000000004 R12: ffff8881088341c0
[   15.581115] R13: 0000000000000000 R14: 0000000000000004 R15: ffff888108834180
[   15.581115]  irq_msi_update_msg+0x4d/0x80
[   15.581115]  msi_set_affinity+0x160/0x190
[   15.581115]  irq_do_set_affinity+0x52/0x190
[   15.581115]  irq_setup_affinity+0xd7/0x170
[   15.581115]  irq_startup+0x5d/0xf0
[   15.581115]  __setup_irq+0x6b9/0x700
[   15.581115]  request_threaded_irq+0xf8/0x160
[   15.581115]  ? irq_remapping_alloc+0x4d0/0x4d0
[   15.581115]  ? e820__memblock_setup+0x7d/0x7d
[   15.581115]  iommu_init_msi+0x60/0x190
[   15.581115]  state_next+0x39d/0x665
[   15.581115]  ? e820__memblock_setup+0x7d/0x7d
[   15.581115]  iommu_go_to_state+0x24/0x28
[   15.581115]  amd_iommu_init+0x11/0x46
[   15.581115]  pci_iommu_init+0x16/0x3f
[   15.581115]  do_one_initcall+0x44/0x1d0
[   15.581115]  kernel_init_freeable+0x1e7/0x249
[   15.581115]  ? rest_init+0xb4/0xb4
[   15.581115]  kernel_init+0xa/0x10c
[   15.581115]  ret_from_fork+0x22/0x30
[   15.581115] ---[ end trace 05c9465e30ba6a20 ]---


Thanks,
Tom

> 

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [EXTERNAL] [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-10 22:00                                                 ` Tom Lendacky
@ 2020-11-10 22:48                                                   ` Thomas Gleixner
  2020-11-10 23:05                                                     ` Tom Lendacky
  2020-11-11  8:16                                                     ` David Woodhouse
  0 siblings, 2 replies; 192+ messages in thread
From: Thomas Gleixner @ 2020-11-10 22:48 UTC (permalink / raw)
  To: Tom Lendacky, David Woodhouse, Borislav Petkov
  Cc: linux-kernel, x86, Qian Cai, Joerg Roedel

On Tue, Nov 10 2020 at 16:00, Tom Lendacky wrote:
> On 11/10/20 3:30 PM, David Woodhouse wrote:
> [   15.581115] WARNING: CPU: 6 PID: 1 at arch/x86/kernel/apic/apic.c:2527 __irq_msi_compose_msg+0x9f/0xb0
> [   15.581115] Call Trace:
> [   15.581115]  irq_msi_update_msg+0x4d/0x80
> [   15.581115]  msi_set_affinity+0x160/0x190

Duh. Yes, that want's some love as well. Delta patch below.

Thanks,

        tglx
---
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -24,10 +24,11 @@ struct irq_domain *x86_pci_msi_default_d
 
 static void irq_msi_update_msg(struct irq_data *irqd, struct irq_cfg *cfg)
 {
+	struct irq_chip *chip = irq_data_get_irq_chip(irqd);
 	struct msi_msg msg[2] = { [1] = { }, };
 
-	__irq_msi_compose_msg(cfg, msg, false);
-	irq_data_get_irq_chip(irqd)->irq_write_msi_msg(irqd, msg);
+	__irq_msi_compose_msg(cfg, msg, chip->flags & IRQCHIP_MSI_EXTID);
+	chip->irq_write_msi_msg(irqd, msg);
 }
 
 static int
@@ -271,7 +272,7 @@ static struct irq_chip iommu_msi_control
 	.irq_retrigger		= irq_chip_retrigger_hierarchy,
 	.irq_set_affinity	= msi_set_affinity,
 	.irq_compose_msi_msg	= iommu_msi_compose_msg,
-	.flags			= IRQCHIP_SKIP_SET_WAKE,
+	.flags			= IRQCHIP_SKIP_SET_WAKE | IRQCHIP_MSI_EXTID,
 };
 
 static struct msi_domain_info iommu_msi_domain_info = {
@@ -310,7 +311,7 @@ static struct irq_chip dmar_msi_controll
 	.irq_retrigger		= irq_chip_retrigger_hierarchy,
 	.irq_compose_msi_msg	= iommu_msi_compose_msg,
 	.irq_write_msi_msg	= dmar_msi_write_msg,
-	.flags			= IRQCHIP_SKIP_SET_WAKE,
+	.flags			= IRQCHIP_SKIP_SET_WAKE | IRQCHIP_MSI_EXTID,
 };
 
 static int dmar_msi_init(struct irq_domain *domain,
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -567,6 +567,8 @@ struct irq_chip {
  * IRQCHIP_SUPPORTS_NMI:              Chip can deliver NMIs, only for root irqchips
  * IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND:  Invokes __enable_irq()/__disable_irq() for wake irqs
  *                                    in the suspend path if they are in disabled state
+ * IRQCHIP_MSI_EXTID		      The MSI message created for this chip can
+ *				      have an otherwise forbidden extended ID
  */
 enum {
 	IRQCHIP_SET_TYPE_MASKED			= (1 <<  0),
@@ -579,6 +581,7 @@ enum {
 	IRQCHIP_SUPPORTS_LEVEL_MSI		= (1 <<  7),
 	IRQCHIP_SUPPORTS_NMI			= (1 <<  8),
 	IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND	= (1 <<  9),
+	IRQCHIP_MSI_EXTID			= (1 << 10),
 };
 
 #include <linux/irqdesc.h>

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [EXTERNAL] [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-10 22:48                                                   ` Thomas Gleixner
@ 2020-11-10 23:05                                                     ` Tom Lendacky
  2020-11-11  8:16                                                     ` David Woodhouse
  1 sibling, 0 replies; 192+ messages in thread
From: Tom Lendacky @ 2020-11-10 23:05 UTC (permalink / raw)
  To: Thomas Gleixner, David Woodhouse, Borislav Petkov
  Cc: linux-kernel, x86, Qian Cai, Joerg Roedel

On 11/10/20 4:48 PM, Thomas Gleixner wrote:
> On Tue, Nov 10 2020 at 16:00, Tom Lendacky wrote:
>> On 11/10/20 3:30 PM, David Woodhouse wrote:
>> [   15.581115] WARNING: CPU: 6 PID: 1 at arch/x86/kernel/apic/apic.c:2527 __irq_msi_compose_msg+0x9f/0xb0
>> [   15.581115] Call Trace:
>> [   15.581115]  irq_msi_update_msg+0x4d/0x80
>> [   15.581115]  msi_set_affinity+0x160/0x190
> 
> Duh. Yes, that want's some love as well. Delta patch below.
> 
> Thanks,
> 
>         tglx

That fixed it.

Thanks,
Tom

> ---
> --- a/arch/x86/kernel/apic/msi.c
> +++ b/arch/x86/kernel/apic/msi.c
> @@ -24,10 +24,11 @@ struct irq_domain *x86_pci_msi_default_d
>  
>  static void irq_msi_update_msg(struct irq_data *irqd, struct irq_cfg *cfg)
>  {
> +	struct irq_chip *chip = irq_data_get_irq_chip(irqd);
>  	struct msi_msg msg[2] = { [1] = { }, };
>  
> -	__irq_msi_compose_msg(cfg, msg, false);
> -	irq_data_get_irq_chip(irqd)->irq_write_msi_msg(irqd, msg);
> +	__irq_msi_compose_msg(cfg, msg, chip->flags & IRQCHIP_MSI_EXTID);
> +	chip->irq_write_msi_msg(irqd, msg);
>  }
>  
>  static int
> @@ -271,7 +272,7 @@ static struct irq_chip iommu_msi_control
>  	.irq_retrigger		= irq_chip_retrigger_hierarchy,
>  	.irq_set_affinity	= msi_set_affinity,
>  	.irq_compose_msi_msg	= iommu_msi_compose_msg,
> -	.flags			= IRQCHIP_SKIP_SET_WAKE,
> +	.flags			= IRQCHIP_SKIP_SET_WAKE | IRQCHIP_MSI_EXTID,
>  };
>  
>  static struct msi_domain_info iommu_msi_domain_info = {
> @@ -310,7 +311,7 @@ static struct irq_chip dmar_msi_controll
>  	.irq_retrigger		= irq_chip_retrigger_hierarchy,
>  	.irq_compose_msi_msg	= iommu_msi_compose_msg,
>  	.irq_write_msi_msg	= dmar_msi_write_msg,
> -	.flags			= IRQCHIP_SKIP_SET_WAKE,
> +	.flags			= IRQCHIP_SKIP_SET_WAKE | IRQCHIP_MSI_EXTID,
>  };
>  
>  static int dmar_msi_init(struct irq_domain *domain,
> --- a/include/linux/irq.h
> +++ b/include/linux/irq.h
> @@ -567,6 +567,8 @@ struct irq_chip {
>   * IRQCHIP_SUPPORTS_NMI:              Chip can deliver NMIs, only for root irqchips
>   * IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND:  Invokes __enable_irq()/__disable_irq() for wake irqs
>   *                                    in the suspend path if they are in disabled state
> + * IRQCHIP_MSI_EXTID		      The MSI message created for this chip can
> + *				      have an otherwise forbidden extended ID
>   */
>  enum {
>  	IRQCHIP_SET_TYPE_MASKED			= (1 <<  0),
> @@ -579,6 +581,7 @@ enum {
>  	IRQCHIP_SUPPORTS_LEVEL_MSI		= (1 <<  7),
>  	IRQCHIP_SUPPORTS_NMI			= (1 <<  8),
>  	IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND	= (1 <<  9),
> +	IRQCHIP_MSI_EXTID			= (1 << 10),
>  };
>  
>  #include <linux/irqdesc.h>
> 

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [EXTERNAL] [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-10 22:48                                                   ` Thomas Gleixner
  2020-11-10 23:05                                                     ` Tom Lendacky
@ 2020-11-11  8:16                                                     ` David Woodhouse
  2020-11-11  9:46                                                       ` Thomas Gleixner
  1 sibling, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-11-11  8:16 UTC (permalink / raw)
  To: Thomas Gleixner, Tom Lendacky, Borislav Petkov
  Cc: linux-kernel, x86, Qian Cai, Joerg Roedel

[-- Attachment #1: Type: text/plain, Size: 882 bytes --]

On Tue, 2020-11-10 at 23:48 +0100, Thomas Gleixner wrote:
> + * IRQCHIP_MSI_EXTID                 The MSI message created for this chip can
> + *                                   have an otherwise forbidden extended ID

If we're going to do that then we could ditch the separate
iommu_compose_msi_msg() function too, right?

But actually I'd be more inclined to fix this differently, in a way
that doesn't still leave AMD's iommu_init_intcapxt() having to set use
irq_set_affinity_notifier() to update its own registers. That's icky.

Given that this is *its* irqdomain in the first place, it should just
sit at ->set_affinity() for itself, and call its parent as usual
without having to use a notifier.

We should also leave it using the basic PCI MSI support in the case
where the IOMMU doesn't have XTSUP support. It doesn't need its own
irqdomain for that.


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [EXTERNAL] [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-11  8:16                                                     ` David Woodhouse
@ 2020-11-11  9:46                                                       ` Thomas Gleixner
  2020-11-11 10:36                                                         ` David Woodhouse
  2020-11-11 14:43                                                         ` [PATCH 1/3] iommu/amd: Don't register interrupt remapping irqdomain when IR is disabled David Woodhouse
  0 siblings, 2 replies; 192+ messages in thread
From: Thomas Gleixner @ 2020-11-11  9:46 UTC (permalink / raw)
  To: David Woodhouse, Tom Lendacky, Borislav Petkov
  Cc: linux-kernel, x86, Qian Cai, Joerg Roedel

On Wed, Nov 11 2020 at 08:16, David Woodhouse wrote:
> On Tue, 2020-11-10 at 23:48 +0100, Thomas Gleixner wrote:
>> + * IRQCHIP_MSI_EXTID                 The MSI message created for this chip can
>> + *                                   have an otherwise forbidden extended ID
>
> If we're going to do that then we could ditch the separate
> iommu_compose_msi_msg() function too, right?
>
> But actually I'd be more inclined to fix this differently, in a way
> that doesn't still leave AMD's iommu_init_intcapxt() having to set use
> irq_set_affinity_notifier() to update its own registers. That's icky.
>
> Given that this is *its* irqdomain in the first place, it should just
> sit at ->set_affinity() for itself, and call its parent as usual
> without having to use a notifier.
>
> We should also leave it using the basic PCI MSI support in the case
> where the IOMMU doesn't have XTSUP support. It doesn't need its own
> irqdomain for that.

Looking at it now with brain awake, the XTSUP stuff is pretty much
the same as DMAR, which I didn't realize yesterday. The affinity
notifier muck is not needed when we have a write_msg() function which
twiddles the bits into those other locations.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [EXTERNAL] [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-11  9:46                                                       ` Thomas Gleixner
@ 2020-11-11 10:36                                                         ` David Woodhouse
  2020-11-11 12:32                                                           ` David Woodhouse
  2020-11-11 14:43                                                         ` [PATCH 1/3] iommu/amd: Don't register interrupt remapping irqdomain when IR is disabled David Woodhouse
  1 sibling, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-11-11 10:36 UTC (permalink / raw)
  To: Thomas Gleixner, Tom Lendacky, Borislav Petkov
  Cc: linux-kernel, x86, Qian Cai, Joerg Roedel

[-- Attachment #1: Type: text/plain, Size: 8056 bytes --]

On Wed, 2020-11-11 at 10:46 +0100, Thomas Gleixner wrote:
> Looking at it now with brain awake, the XTSUP stuff is pretty much
> the same as DMAR, which I didn't realize yesterday. The affinity
> notifier muck is not needed when we have a write_msg() function which
> twiddles the bits into those other locations.

I kind of hate the fact that it's swizzling those bits through invalid
MSI messages, so I did it as its own irqdomain using
irq_domain_create_hierarchy() directly instead of
msi_create_irq_domain().

Looks something like this, although I believe I'm missing a call to
irq_domain_set_info() somewhere which means that the irq_chip
'intcapxt_controller' isn't being used at all...


diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 90a8add186e0..b68fe10aa67d 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -16,6 +16,7 @@
 #include <linux/syscore_ops.h>
 #include <linux/interrupt.h>
 #include <linux/msi.h>
+#include <linux/irq.h>
 #include <linux/amd-iommu.h>
 #include <linux/export.h>
 #include <linux/kmemleak.h>
@@ -1563,8 +1564,7 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h)
 		 * for MSI address lo/hi/data, we need to check both
 		 * EFR[XtSup] and EFR[MsiCapMmioSup] for x2APIC support.
 		 */
-		if ((h->efr_reg & BIT(IOMMU_EFR_XTSUP_SHIFT)) &&
-		    (h->efr_reg & BIT(IOMMU_EFR_MSICAPMMIOSUP_SHIFT)))
+		if (h->efr_reg & BIT(IOMMU_EFR_XTSUP_SHIFT))
 			amd_iommu_xt_mode = IRQ_REMAP_X2APIC_MODE;
 		break;
 	default:
@@ -1978,28 +1978,27 @@ union intcapxt {
 		destid_24_31		:  8;
 } __attribute__ ((packed));
 
-/*
- * Setup the IntCapXT registers with interrupt routing information
- * based on the PCI MSI capability block registers, accessed via
- * MMIO MSI address low/hi and MSI data registers.
- */
-static void iommu_update_intcapxt(struct amd_iommu *iommu)
+/* We weren't actually doing anything before. Should we? */
+static void intcapxt_unmask_irq(struct irq_data *data)
 {
-	struct msi_msg msg;
-	union intcapxt xt;
-	u32 destid;
+}
+static void intcapxt_mask_irq(struct irq_data *data)
+{
+}
 
-	msg.address_lo = readl(iommu->mmio_base + MMIO_MSI_ADDR_LO_OFFSET);
-	msg.address_hi = readl(iommu->mmio_base + MMIO_MSI_ADDR_HI_OFFSET);
-	msg.data = readl(iommu->mmio_base + MMIO_MSI_DATA_OFFSET);
 
-	destid = x86_msi_msg_get_destid(&msg, x2apic_enabled());
+static int intcapxt_irqdomain_activate(struct irq_domain *domain,
+				       struct irq_data *irqd, bool reserve)
+{
+	struct amd_iommu *iommu = domain->host_data;
+	struct irq_cfg *cfg = irqd_cfg(irqd);
+	union intcapxt xt;
 
 	xt.capxt = 0ULL;
-	xt.dest_mode_logical = msg.arch_data.dest_mode_logical;
-	xt.vector = msg.arch_data.vector;
-	xt.destid_0_23 = destid & GENMASK(23, 0);
-	xt.destid_24_31 = destid >> 24;
+	xt.dest_mode_logical = apic->dest_mode_logical;
+	xt.vector = cfg->vector;
+	xt.destid_0_23 = cfg->dest_apicid & GENMASK(23, 0);
+	xt.destid_24_31 = cfg->dest_apicid >> 24;
 
 	/**
 	 * Current IOMMU implemtation uses the same IRQ for all
@@ -2008,64 +2007,117 @@ static void iommu_update_intcapxt(struct amd_iommu *iommu)
 	writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_EVT_OFFSET);
 	writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_PPR_OFFSET);
 	writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_GALOG_OFFSET);
+
+	return 0;
 }
 
-static void _irq_notifier_notify(struct irq_affinity_notify *notify,
-				 const cpumask_t *mask)
+static void intcapxt_irqdomain_deactivate(struct irq_domain *domain,
+					  struct irq_data *irqd)
 {
-	struct amd_iommu *iommu;
+	intcapxt_mask_irq(irqd);
+}
 
-	for_each_iommu(iommu) {
-		if (iommu->dev->irq == notify->irq) {
-			iommu_update_intcapxt(iommu);
-			break;
-		}
-	}
+
+static int intcapxt_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
+				    unsigned int nr_irqs, void *arg)
+{
+	return irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
 }
 
-static void _irq_notifier_release(struct kref *ref)
+static void intcapxt_irqdomain_free(struct irq_domain *domain, unsigned int virq,
+				    unsigned int nr_irqs)
 {
+	irq_domain_free_irqs_top(domain, virq, nr_irqs);
 }
 
-static int iommu_init_intcapxt(struct amd_iommu *iommu)
+static int intcapxt_set_affinity(struct irq_data *irqd,
+				 const struct cpumask *mask, bool force)
 {
+	struct irq_data *parent = irqd->parent_data;
 	int ret;
-	struct irq_affinity_notify *notify = &iommu->intcapxt_notify;
 
-	/**
-	 * IntCapXT requires XTSup=1 and MsiCapMmioSup=1,
-	 * which can be inferred from amd_iommu_xt_mode.
-	 */
-	if (amd_iommu_xt_mode != IRQ_REMAP_X2APIC_MODE)
-		return 0;
+	ret = parent->chip->irq_set_affinity(parent, mask, force);
+	if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
+		return ret;
 
-	/**
-	 * Also, we need to setup notifier to update the IntCapXT registers
-	 * whenever the irq affinity is changed from user-space.
-	 */
-	notify->irq = iommu->dev->irq;
-	notify->notify = _irq_notifier_notify,
-	notify->release = _irq_notifier_release,
-	ret = irq_set_affinity_notifier(iommu->dev->irq, notify);
+	return intcapxt_irqdomain_activate(irqd->domain, irqd, false);
+}
+
+static struct irq_chip intcapxt_controller = {
+	.name                   = "IOMMU-MSI",
+	.irq_unmask             = intcapxt_unmask_irq,
+	.irq_mask               = intcapxt_mask_irq,
+	.irq_ack                = irq_chip_ack_parent,
+	.irq_retrigger          = irq_chip_retrigger_hierarchy,
+	.irq_set_affinity       = intcapxt_set_affinity,
+	.flags                  = IRQCHIP_SKIP_SET_WAKE,
+};
+
+static const struct irq_domain_ops intcapxt_domain_ops = {
+	.alloc			= intcapxt_irqdomain_alloc,
+	.free			= intcapxt_irqdomain_free,
+	.activate		= intcapxt_irqdomain_activate,
+	.deactivate		= intcapxt_irqdomain_deactivate,
+};
+
+static struct irq_domain *iommu_get_irqdomain(struct amd_iommu *iommu)
+{
+	struct irq_domain *domain = NULL;
+	struct fwnode_handle *fn;
+
+	fn = irq_domain_alloc_named_id_fwnode("AMD-Vi-MSI", iommu->index);
+	if (!fn)
+		return NULL;
+
+	domain = irq_domain_create_hierarchy(x86_vector_domain, 0, 0,
+					     fn, &intcapxt_domain_ops,
+					     iommu);
+	if (!domain)
+		irq_domain_free_fwnode(fn);
+
+	return domain;
+}
+
+static int iommu_setup_intcapxt(struct amd_iommu *iommu)
+{
+	struct irq_domain *domain;
+	int irq, ret;
+
+	domain = iommu_get_irqdomain(iommu);
+	if (!domain)
+		return -ENXIO;
+
+	irq = irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, NULL);
+	if (irq < 0) {
+		irq_domain_remove(domain);
+		return irq;
+	}
+
+	ret = request_threaded_irq(irq,
+				   amd_iommu_int_handler,
+				   amd_iommu_int_thread,
+				   0, "AMD-Vi",
+				   iommu);
 	if (ret) {
-		pr_err("Failed to register irq affinity notifier (devid=%#x, irq %d)\n",
-		       iommu->devid, iommu->dev->irq);
+		irq_domain_free_irqs(irq, 1);
+		irq_domain_remove(domain);
 		return ret;
 	}
 
-	iommu_update_intcapxt(iommu);
 	iommu_feature_enable(iommu, CONTROL_INTCAPXT_EN);
-	return ret;
+	return 0;
 }
 
-static int iommu_init_msi(struct amd_iommu *iommu)
+static int iommu_init_irq(struct amd_iommu *iommu)
 {
 	int ret;
 
 	if (iommu->int_enabled)
 		goto enable_faults;
 
-	if (iommu->dev->msi_cap)
+	if (amd_iommu_xt_mode == IRQ_REMAP_X2APIC_MODE)
+		ret = iommu_setup_intcapxt(iommu);
+	else if (iommu->dev->msi_cap)
 		ret = iommu_setup_msi(iommu);
 	else
 		ret = -ENODEV;
@@ -2074,10 +2126,6 @@ static int iommu_init_msi(struct amd_iommu *iommu)
 		return ret;
 
 enable_faults:
-	ret = iommu_init_intcapxt(iommu);
-	if (ret)
-		return ret;
-
 	iommu_feature_enable(iommu, CONTROL_EVT_INT_EN);
 
 	if (iommu->ppr_log != NULL)
@@ -2700,7 +2748,7 @@ static int amd_iommu_enable_interrupts(void)
 	int ret = 0;
 
 	for_each_iommu(iommu) {
-		ret = iommu_init_msi(iommu);
+		ret = iommu_init_irq(iommu);
 		if (ret)
 			goto out;
 	}

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* Re: [EXTERNAL] [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-11 10:36                                                         ` David Woodhouse
@ 2020-11-11 12:32                                                           ` David Woodhouse
  2020-11-11 20:30                                                             ` Tom Lendacky
  0 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-11-11 12:32 UTC (permalink / raw)
  To: Thomas Gleixner, Tom Lendacky, Borislav Petkov
  Cc: linux-kernel, x86, Qian Cai, Joerg Roedel

[-- Attachment #1: Type: text/plain, Size: 708 bytes --]

On Wed, 2020-11-11 at 10:36 +0000, David Woodhouse wrote:
> On Wed, 2020-11-11 at 10:46 +0100, Thomas Gleixner wrote:
> > Looking at it now with brain awake, the XTSUP stuff is pretty much
> > the same as DMAR, which I didn't realize yesterday. The affinity
> > notifier muck is not needed when we have a write_msg() function which
> > twiddles the bits into those other locations.
> 
> I kind of hate the fact that it's swizzling those bits through invalid
> MSI messages, so I did it as its own irqdomain using
> irq_domain_create_hierarchy() directly instead of
> msi_create_irq_domain().

Please give this a spin:

https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/amdvi

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH 1/3] iommu/amd: Don't register interrupt remapping irqdomain when IR is disabled
  2020-11-11  9:46                                                       ` Thomas Gleixner
  2020-11-11 10:36                                                         ` David Woodhouse
@ 2020-11-11 14:43                                                         ` David Woodhouse
  2020-11-11 14:43                                                           ` [PATCH 2/3] iommu/amd: Fix union of bitfields in intcapxt support David Woodhouse
                                                                             ` (2 more replies)
  1 sibling, 3 replies; 192+ messages in thread
From: David Woodhouse @ 2020-11-11 14:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Tom Lendacky, Borislav Petkov, linux-kernel, x86, Qian Cai, Joerg Roedel

From: David Woodhouse <dwmw@amazon.co.uk>

This was potentially allowing I/OAPIC and MSI interrupts to be parented
in the IOMMU IR domain even when IR was disabled. Don't do that.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 drivers/iommu/amd/init.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 263670d36fed..90a8add186e0 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1601,9 +1601,11 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h)
 	if (ret)
 		return ret;
 
-	ret = amd_iommu_create_irq_domain(iommu);
-	if (ret)
-		return ret;
+	if (amd_iommu_irq_remap) {
+		ret = amd_iommu_create_irq_domain(iommu);
+		if (ret)
+			return ret;
+	}
 
 	/*
 	 * Make sure IOMMU is not considered to translate itself. The IVRS
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH 2/3] iommu/amd: Fix union of bitfields in intcapxt support
  2020-11-11 14:43                                                         ` [PATCH 1/3] iommu/amd: Don't register interrupt remapping irqdomain when IR is disabled David Woodhouse
@ 2020-11-11 14:43                                                           ` David Woodhouse
  2020-11-11 22:06                                                             ` [tip: x86/apic] " tip-bot2 for David Woodhouse
  2020-11-11 14:43                                                           ` [PATCH 3/3] iommu/amd: Fix IOMMU interrupt generation in X2APIC mode David Woodhouse
  2020-11-11 22:06                                                           ` [tip: x86/apic] iommu/amd: Don't register interrupt remapping irqdomain when IR is disabled tip-bot2 for David Woodhouse
  2 siblings, 1 reply; 192+ messages in thread
From: David Woodhouse @ 2020-11-11 14:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Tom Lendacky, Borislav Petkov, linux-kernel, x86, Qian Cai, Joerg Roedel

From: David Woodhouse <dwmw@amazon.co.uk>

All the bitfields in here are overlaid on top of each other since
they're a union. Change the second u64 to be in a struct so it does
the intended thing.

Fixes: b5c3786ee3704 ("iommu/amd: Use msi_msg shadow structs")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 drivers/iommu/amd/init.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 90a8add186e0..a94b96f1e13a 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1969,13 +1969,15 @@ static int iommu_setup_msi(struct amd_iommu *iommu)
 
 union intcapxt {
 	u64	capxt;
-	u64	reserved_0		:  2,
-		dest_mode_logical	:  1,
-		reserved_1		:  5,
-		destid_0_23		: 24,
-		vector			:  8,
-		reserved_2		: 16,
-		destid_24_31		:  8;
+	struct {
+		u64	reserved_0		:  2,
+			dest_mode_logical	:  1,
+			reserved_1		:  5,
+			destid_0_23		: 24,
+			vector			:  8,
+			reserved_2		: 16,
+			destid_24_31		:  8;
+	};
 } __attribute__ ((packed));
 
 /*
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH 3/3] iommu/amd: Fix IOMMU interrupt generation in X2APIC mode
  2020-11-11 14:43                                                         ` [PATCH 1/3] iommu/amd: Don't register interrupt remapping irqdomain when IR is disabled David Woodhouse
  2020-11-11 14:43                                                           ` [PATCH 2/3] iommu/amd: Fix union of bitfields in intcapxt support David Woodhouse
@ 2020-11-11 14:43                                                           ` David Woodhouse
  2020-11-11 22:06                                                           ` [tip: x86/apic] iommu/amd: Don't register interrupt remapping irqdomain when IR is disabled tip-bot2 for David Woodhouse
  2 siblings, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-11-11 14:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Tom Lendacky, Borislav Petkov, linux-kernel, x86, Qian Cai, Joerg Roedel

From: David Woodhouse <dwmw@amazon.co.uk>

The AMD IOMMU has two modes for generating its own interrupts.

The first is very much based on PCI MSI, and can be configured by Linux
precisely that way. But like legacy unmapped PCI MSI it's limited to
8 bits of APIC ID.

The second method does not use PCI MSI at all in hardawre, and instead
configures the INTCAPXT registers in the IOMMU directly with the APIC ID
and vector.

In the latter case, the IOMMU driver would still use pci_enable_msi(),
read back (through MMIO) the MSI message that Linux wrote to the PCI MSI
table, then swizzle those bits into the appropriate register.

Historically, this worked because__irq_compose_msi_msg() would silently
generate an invalid MSI message with the high bits of the APIC ID in the
high bits of the MSI address. That hack was intended only for the Intel
IOMMU, and I recently enforced that, introducing a warning in
__irq_msi_compose_msg() if it was invoked with an APIC ID above 255.

Fix the AMD IOMMU not to depend on that hack any more, by having its own
irqdomain and directly putting the bits from the irq_cfg into the right
place in its ->activate() method.

Fixes: 47bea873cf80 "x86/msi: Only use high bits of MSI address for DMAR unit")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/hw_irq.h |   1 +
 drivers/iommu/amd/init.c      | 190 +++++++++++++++++++++++-----------
 2 files changed, 132 insertions(+), 59 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 458f5a676402..d465ece58151 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -39,6 +39,7 @@ enum irq_alloc_type {
 	X86_IRQ_ALLOC_TYPE_PCI_MSI,
 	X86_IRQ_ALLOC_TYPE_PCI_MSIX,
 	X86_IRQ_ALLOC_TYPE_DMAR,
+	X86_IRQ_ALLOC_TYPE_AMDVI,
 	X86_IRQ_ALLOC_TYPE_UV,
 };
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index a94b96f1e13a..a57400a73abb 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -16,6 +16,7 @@
 #include <linux/syscore_ops.h>
 #include <linux/interrupt.h>
 #include <linux/msi.h>
+#include <linux/irq.h>
 #include <linux/amd-iommu.h>
 #include <linux/export.h>
 #include <linux/kmemleak.h>
@@ -1557,14 +1558,7 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h)
 			break;
 		}
 
-		/*
-		 * Note: Since iommu_update_intcapxt() leverages
-		 * the IOMMU MMIO access to MSI capability block registers
-		 * for MSI address lo/hi/data, we need to check both
-		 * EFR[XtSup] and EFR[MsiCapMmioSup] for x2APIC support.
-		 */
-		if ((h->efr_reg & BIT(IOMMU_EFR_XTSUP_SHIFT)) &&
-		    (h->efr_reg & BIT(IOMMU_EFR_MSICAPMMIOSUP_SHIFT)))
+		if (h->efr_reg & BIT(IOMMU_EFR_XTSUP_SHIFT))
 			amd_iommu_xt_mode = IRQ_REMAP_X2APIC_MODE;
 		break;
 	default:
@@ -1981,27 +1975,32 @@ union intcapxt {
 } __attribute__ ((packed));
 
 /*
- * Setup the IntCapXT registers with interrupt routing information
- * based on the PCI MSI capability block registers, accessed via
- * MMIO MSI address low/hi and MSI data registers.
+ * There isn't really any need to mask/unmask at the irqchip level because
+ * the 64-bit INTCAPXT registers can be updated atomically without tearing
+ * when the affinity is being updated.
  */
-static void iommu_update_intcapxt(struct amd_iommu *iommu)
+static void intcapxt_unmask_irq(struct irq_data *data)
 {
-	struct msi_msg msg;
-	union intcapxt xt;
-	u32 destid;
+}
 
-	msg.address_lo = readl(iommu->mmio_base + MMIO_MSI_ADDR_LO_OFFSET);
-	msg.address_hi = readl(iommu->mmio_base + MMIO_MSI_ADDR_HI_OFFSET);
-	msg.data = readl(iommu->mmio_base + MMIO_MSI_DATA_OFFSET);
+static void intcapxt_mask_irq(struct irq_data *data)
+{
+}
 
-	destid = x86_msi_msg_get_destid(&msg, x2apic_enabled());
+static struct irq_chip intcapxt_controller;
+
+static int intcapxt_irqdomain_activate(struct irq_domain *domain,
+				       struct irq_data *irqd, bool reserve)
+{
+	struct amd_iommu *iommu = irqd->chip_data;
+	struct irq_cfg *cfg = irqd_cfg(irqd);
+	union intcapxt xt;
 
 	xt.capxt = 0ULL;
-	xt.dest_mode_logical = msg.arch_data.dest_mode_logical;
-	xt.vector = msg.arch_data.vector;
-	xt.destid_0_23 = destid & GENMASK(23, 0);
-	xt.destid_24_31 = destid >> 24;
+	xt.dest_mode_logical = apic->dest_mode_logical;
+	xt.vector = cfg->vector;
+	xt.destid_0_23 = cfg->dest_apicid & GENMASK(23, 0);
+	xt.destid_24_31 = cfg->dest_apicid >> 24;
 
 	/**
 	 * Current IOMMU implemtation uses the same IRQ for all
@@ -2010,64 +2009,141 @@ static void iommu_update_intcapxt(struct amd_iommu *iommu)
 	writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_EVT_OFFSET);
 	writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_PPR_OFFSET);
 	writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_GALOG_OFFSET);
+	return 0;
 }
 
-static void _irq_notifier_notify(struct irq_affinity_notify *notify,
-				 const cpumask_t *mask)
+static void intcapxt_irqdomain_deactivate(struct irq_domain *domain,
+					  struct irq_data *irqd)
 {
-	struct amd_iommu *iommu;
+	intcapxt_mask_irq(irqd);
+}
 
-	for_each_iommu(iommu) {
-		if (iommu->dev->irq == notify->irq) {
-			iommu_update_intcapxt(iommu);
-			break;
-		}
+
+static int intcapxt_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
+				    unsigned int nr_irqs, void *arg)
+{
+	struct irq_alloc_info *info = arg;
+	int i, ret;
+
+	if (!info || info->type != X86_IRQ_ALLOC_TYPE_AMDVI)
+		return -EINVAL;
+
+	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
+	if (ret < 0)
+		return ret;
+
+	for (i = virq; i < virq + nr_irqs; i++) {
+		struct irq_data *irqd = irq_domain_get_irq_data(domain, i);
+
+		irqd->chip = &intcapxt_controller;
+		irqd->chip_data = info->data;
 	}
+
+	return ret;
 }
 
-static void _irq_notifier_release(struct kref *ref)
+static void intcapxt_irqdomain_free(struct irq_domain *domain, unsigned int virq,
+				    unsigned int nr_irqs)
 {
+	irq_domain_free_irqs_top(domain, virq, nr_irqs);
 }
 
-static int iommu_init_intcapxt(struct amd_iommu *iommu)
+static int intcapxt_set_affinity(struct irq_data *irqd,
+				 const struct cpumask *mask, bool force)
 {
+	struct irq_data *parent = irqd->parent_data;
 	int ret;
-	struct irq_affinity_notify *notify = &iommu->intcapxt_notify;
 
-	/**
-	 * IntCapXT requires XTSup=1 and MsiCapMmioSup=1,
-	 * which can be inferred from amd_iommu_xt_mode.
-	 */
-	if (amd_iommu_xt_mode != IRQ_REMAP_X2APIC_MODE)
-		return 0;
+	ret = parent->chip->irq_set_affinity(parent, mask, force);
+	if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
+		return ret;
 
-	/**
-	 * Also, we need to setup notifier to update the IntCapXT registers
-	 * whenever the irq affinity is changed from user-space.
-	 */
-	notify->irq = iommu->dev->irq;
-	notify->notify = _irq_notifier_notify,
-	notify->release = _irq_notifier_release,
-	ret = irq_set_affinity_notifier(iommu->dev->irq, notify);
+	return intcapxt_irqdomain_activate(irqd->domain, irqd, false);
+}
+
+static struct irq_chip intcapxt_controller = {
+	.name			= "IOMMU-MSI",
+	.irq_unmask		= intcapxt_unmask_irq,
+	.irq_mask		= intcapxt_mask_irq,
+	.irq_ack		= irq_chip_ack_parent,
+	.irq_retrigger		= irq_chip_retrigger_hierarchy,
+	.irq_set_affinity       = intcapxt_set_affinity,
+	.flags			= IRQCHIP_SKIP_SET_WAKE,
+};
+
+static const struct irq_domain_ops intcapxt_domain_ops = {
+	.alloc			= intcapxt_irqdomain_alloc,
+	.free			= intcapxt_irqdomain_free,
+	.activate		= intcapxt_irqdomain_activate,
+	.deactivate		= intcapxt_irqdomain_deactivate,
+};
+
+
+static struct irq_domain *iommu_irqdomain;
+
+static struct irq_domain *iommu_get_irqdomain(void)
+{
+	struct fwnode_handle *fn;
+
+	/* No need for locking here (yet) as the init is single-threaded */
+	if (iommu_irqdomain)
+		return iommu_irqdomain;
+
+	fn = irq_domain_alloc_named_fwnode("AMD-Vi-MSI");
+	if (!fn)
+		return NULL;
+
+	iommu_irqdomain = irq_domain_create_hierarchy(x86_vector_domain, 0, 0,
+						      fn, &intcapxt_domain_ops,
+						      NULL);
+	if (!iommu_irqdomain)
+		irq_domain_free_fwnode(fn);
+
+	return iommu_irqdomain;
+}
+
+static int iommu_setup_intcapxt(struct amd_iommu *iommu)
+{
+	struct irq_domain *domain;
+	struct irq_alloc_info info;
+	int irq, ret;
+
+	domain = iommu_get_irqdomain();
+	if (!domain)
+		return -ENXIO;
+
+	init_irq_alloc_info(&info, NULL);
+	info.type = X86_IRQ_ALLOC_TYPE_AMDVI;
+	info.data = iommu;
+
+	irq = irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, &info);
+	if (irq < 0) {
+		irq_domain_remove(domain);
+		return irq;
+	}
+
+	ret = request_threaded_irq(irq, amd_iommu_int_handler,
+				   amd_iommu_int_thread, 0, "AMD-Vi", iommu);
 	if (ret) {
-		pr_err("Failed to register irq affinity notifier (devid=%#x, irq %d)\n",
-		       iommu->devid, iommu->dev->irq);
+		irq_domain_free_irqs(irq, 1);
+		irq_domain_remove(domain);
 		return ret;
 	}
 
-	iommu_update_intcapxt(iommu);
 	iommu_feature_enable(iommu, CONTROL_INTCAPXT_EN);
-	return ret;
+	return 0;
 }
 
-static int iommu_init_msi(struct amd_iommu *iommu)
+static int iommu_init_irq(struct amd_iommu *iommu)
 {
 	int ret;
 
 	if (iommu->int_enabled)
 		goto enable_faults;
 
-	if (iommu->dev->msi_cap)
+	if (amd_iommu_xt_mode == IRQ_REMAP_X2APIC_MODE)
+		ret = iommu_setup_intcapxt(iommu);
+	else if (iommu->dev->msi_cap)
 		ret = iommu_setup_msi(iommu);
 	else
 		ret = -ENODEV;
@@ -2076,10 +2152,6 @@ static int iommu_init_msi(struct amd_iommu *iommu)
 		return ret;
 
 enable_faults:
-	ret = iommu_init_intcapxt(iommu);
-	if (ret)
-		return ret;
-
 	iommu_feature_enable(iommu, CONTROL_EVT_INT_EN);
 
 	if (iommu->ppr_log != NULL)
@@ -2702,7 +2774,7 @@ static int amd_iommu_enable_interrupts(void)
 	int ret = 0;
 
 	for_each_iommu(iommu) {
-		ret = iommu_init_msi(iommu);
+		ret = iommu_init_irq(iommu);
 		if (ret)
 			goto out;
 	}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* Re: [EXTERNAL] [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-11 12:32                                                           ` David Woodhouse
@ 2020-11-11 20:30                                                             ` Tom Lendacky
  2020-11-11 21:19                                                               ` David Woodhouse
  2020-11-13 15:14                                                               ` David Woodhouse
  0 siblings, 2 replies; 192+ messages in thread
From: Tom Lendacky @ 2020-11-11 20:30 UTC (permalink / raw)
  To: David Woodhouse, Thomas Gleixner, Borislav Petkov
  Cc: linux-kernel, x86, Qian Cai, Joerg Roedel, Suravee Suthikulpanit

On 11/11/20 6:32 AM, David Woodhouse wrote:
> On Wed, 2020-11-11 at 10:36 +0000, David Woodhouse wrote:
>> On Wed, 2020-11-11 at 10:46 +0100, Thomas Gleixner wrote:
>>> Looking at it now with brain awake, the XTSUP stuff is pretty much
>>> the same as DMAR, which I didn't realize yesterday. The affinity
>>> notifier muck is not needed when we have a write_msg() function which
>>> twiddles the bits into those other locations.
>>
>> I kind of hate the fact that it's swizzling those bits through invalid
>> MSI messages, so I did it as its own irqdomain using
>> irq_domain_create_hierarchy() directly instead of
>> msi_create_irq_domain().
> 
> Please give this a spin:
> 
> https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/amdvi

I had trouble cloning your tree for some reason, so just took the top
three patches and applied them to the tip tree. This all appears to be
working. I'll let the IOMMU experts take a closer look (adding Suravee).

Thanks,
Tom

> 

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [EXTERNAL] [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-11 20:30                                                             ` Tom Lendacky
@ 2020-11-11 21:19                                                               ` David Woodhouse
  2020-11-13 15:14                                                               ` David Woodhouse
  1 sibling, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-11-11 21:19 UTC (permalink / raw)
  To: Tom Lendacky, Thomas Gleixner, Borislav Petkov
  Cc: linux-kernel, x86, Qian Cai, Joerg Roedel, Suravee Suthikulpanit

[-- Attachment #1: Type: text/plain, Size: 1289 bytes --]

On Wed, 2020-11-11 at 14:30 -0600, Tom Lendacky wrote:
> On 11/11/20 6:32 AM, David Woodhouse wrote:
> > On Wed, 2020-11-11 at 10:36 +0000, David Woodhouse wrote:
> > > On Wed, 2020-11-11 at 10:46 +0100, Thomas Gleixner wrote:
> > > > Looking at it now with brain awake, the XTSUP stuff is pretty much
> > > > the same as DMAR, which I didn't realize yesterday. The affinity
> > > > notifier muck is not needed when we have a write_msg() function which
> > > > twiddles the bits into those other locations.
> > > 
> > > I kind of hate the fact that it's swizzling those bits through invalid
> > > MSI messages, so I did it as its own irqdomain using
> > > irq_domain_create_hierarchy() directly instead of
> > > msi_create_irq_domain().
> > 
> > Please give this a spin:
> > 
> > https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/amdvi
> 
> I had trouble cloning your tree for some reason, so just took the top
> three patches and applied them to the tip tree. This all appears to be
> working. I'll let the IOMMU experts take a closer look (adding Suravee).

Thanks. Exercising it by actually triggering faults and observing the
IOMMU interrupt really working would be quite useful — on hardware with
both INTCAPXT and the older MSI delivery.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 192+ messages in thread

* [tip: x86/apic] iommu/amd: Don't register interrupt remapping irqdomain when IR is disabled
  2020-11-11 14:43                                                         ` [PATCH 1/3] iommu/amd: Don't register interrupt remapping irqdomain when IR is disabled David Woodhouse
  2020-11-11 14:43                                                           ` [PATCH 2/3] iommu/amd: Fix union of bitfields in intcapxt support David Woodhouse
  2020-11-11 14:43                                                           ` [PATCH 3/3] iommu/amd: Fix IOMMU interrupt generation in X2APIC mode David Woodhouse
@ 2020-11-11 22:06                                                           ` tip-bot2 for David Woodhouse
  2 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for David Woodhouse @ 2020-11-11 22:06 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: David Woodhouse, Thomas Gleixner, x86, linux-kernel

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     2df985f5e44c43f5d29d8cc3aaa8e8ac697e9de6
Gitweb:        https://git.kernel.org/tip/2df985f5e44c43f5d29d8cc3aaa8e8ac697e9de6
Author:        David Woodhouse <dwmw@amazon.co.uk>
AuthorDate:    Wed, 11 Nov 2020 14:43:20 
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 11 Nov 2020 23:01:58 +01:00

iommu/amd: Don't register interrupt remapping irqdomain when IR is disabled

Registering the remapping irq domain unconditionally is potentially
allowing I/O-APIC and MSI interrupts to be parented in the IOMMU IR domain
even when IR is disabled. Don't do that.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201111144322.1659970-1-dwmw2@infradead.org

---
 drivers/iommu/amd/init.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index c2769f2..a94b96f 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1601,9 +1601,11 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h)
 	if (ret)
 		return ret;
 
-	ret = amd_iommu_create_irq_domain(iommu);
-	if (ret)
-		return ret;
+	if (amd_iommu_irq_remap) {
+		ret = amd_iommu_create_irq_domain(iommu);
+		if (ret)
+			return ret;
+	}
 
 	/*
 	 * Make sure IOMMU is not considered to translate itself. The IVRS

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/apic] iommu/amd: Fix union of bitfields in intcapxt support
  2020-11-11 14:43                                                           ` [PATCH 2/3] iommu/amd: Fix union of bitfields in intcapxt support David Woodhouse
@ 2020-11-11 22:06                                                             ` tip-bot2 for David Woodhouse
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for David Woodhouse @ 2020-11-11 22:06 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: David Woodhouse, Thomas Gleixner, x86, linux-kernel

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     2fb6acf3edfeb904505f9ba3fd01166866062591
Gitweb:        https://git.kernel.org/tip/2fb6acf3edfeb904505f9ba3fd01166866062591
Author:        David Woodhouse <dwmw@amazon.co.uk>
AuthorDate:    Wed, 11 Nov 2020 14:43:21 
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 11 Nov 2020 23:01:57 +01:00

iommu/amd: Fix union of bitfields in intcapxt support

All the bitfields in here are overlaid on top of each other since
they're a union. Change the second u64 to be in a struct so it does
the intended thing.

Fixes: b5c3786ee370 ("iommu/amd: Use msi_msg shadow structs")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201111144322.1659970-2-dwmw2@infradead.org

---
 drivers/iommu/amd/init.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 263670d..c2769f2 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1967,13 +1967,15 @@ static int iommu_setup_msi(struct amd_iommu *iommu)
 
 union intcapxt {
 	u64	capxt;
-	u64	reserved_0		:  2,
-		dest_mode_logical	:  1,
-		reserved_1		:  5,
-		destid_0_23		: 24,
-		vector			:  8,
-		reserved_2		: 16,
-		destid_24_31		:  8;
+	struct {
+		u64	reserved_0		:  2,
+			dest_mode_logical	:  1,
+			reserved_1		:  5,
+			destid_0_23		: 24,
+			vector			:  8,
+			reserved_2		: 16,
+			destid_24_31		:  8;
+	};
 } __attribute__ ((packed));
 
 /*

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* Re: [EXTERNAL] [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-11 20:30                                                             ` Tom Lendacky
  2020-11-11 21:19                                                               ` David Woodhouse
@ 2020-11-13 15:14                                                               ` David Woodhouse
  2020-11-16 18:02                                                                 ` David Woodhouse
                                                                                   ` (2 more replies)
  1 sibling, 3 replies; 192+ messages in thread
From: David Woodhouse @ 2020-11-13 15:14 UTC (permalink / raw)
  To: Tom Lendacky, Thomas Gleixner, Borislav Petkov
  Cc: linux-kernel, x86, Qian Cai, Joerg Roedel, Suravee Suthikulpanit

[-- Attachment #1: Type: text/plain, Size: 10698 bytes --]

On Wed, 2020-11-11 at 14:30 -0600, Tom Lendacky wrote:
> I had trouble cloning your tree for some reason, so just took the top
> three patches and applied them to the tip tree. This all appears to be
> working. I'll let the IOMMU experts take a closer look (adding Suravee).

Thanks. I see Thomas has taken the first two into the tip.git x86/apic
branch already, so we're just looking for an ack on the third. Which is
this one...

From 49ee4fa51b8c06d14b7c4c74d15a7d76f865a8ea Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw@amazon.co.uk>
Date: Wed, 11 Nov 2020 12:09:01 +0000
Subject: [PATCH] iommu/amd: Fix IOMMU interrupt generation in X2APIC mode

The AMD IOMMU has two modes for generating its own interrupts.

The first is very much based on PCI MSI, and can be configured by Linux
precisely that way. But like legacy unmapped PCI MSI it's limited to
8 bits of APIC ID.

The second method does not use PCI MSI at all in hardawre, and instead
configures the INTCAPXT registers in the IOMMU directly with the APIC ID
and vector.

In the latter case, the IOMMU driver would still use pci_enable_msi(),
read back (through MMIO) the MSI message that Linux wrote to the PCI MSI
table, then swizzle those bits into the appropriate register.

Historically, this worked because__irq_compose_msi_msg() would silently
generate an invalid MSI message with the high bits of the APIC ID in the
high bits of the MSI address. That hack was intended only for the Intel
IOMMU, and I recently enforced that, introducing a warning in
__irq_msi_compose_msg() if it was invoked with an APIC ID above 255.

Fix the AMD IOMMU not to depend on that hack any more, by having its own
irqdomain and directly putting the bits from the irq_cfg into the right
place in its ->activate() method.

Fixes: 47bea873cf80 "x86/msi: Only use high bits of MSI address for DMAR unit")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/hw_irq.h |   1 +
 drivers/iommu/amd/init.c      | 190 +++++++++++++++++++++++-----------
 2 files changed, 132 insertions(+), 59 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 458f5a676402..d465ece58151 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -39,6 +39,7 @@ enum irq_alloc_type {
 	X86_IRQ_ALLOC_TYPE_PCI_MSI,
 	X86_IRQ_ALLOC_TYPE_PCI_MSIX,
 	X86_IRQ_ALLOC_TYPE_DMAR,
+	X86_IRQ_ALLOC_TYPE_AMDVI,
 	X86_IRQ_ALLOC_TYPE_UV,
 };
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index a94b96f1e13a..a57400a73abb 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -16,6 +16,7 @@
 #include <linux/syscore_ops.h>
 #include <linux/interrupt.h>
 #include <linux/msi.h>
+#include <linux/irq.h>
 #include <linux/amd-iommu.h>
 #include <linux/export.h>
 #include <linux/kmemleak.h>
@@ -1557,14 +1558,7 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h)
 			break;
 		}
 
-		/*
-		 * Note: Since iommu_update_intcapxt() leverages
-		 * the IOMMU MMIO access to MSI capability block registers
-		 * for MSI address lo/hi/data, we need to check both
-		 * EFR[XtSup] and EFR[MsiCapMmioSup] for x2APIC support.
-		 */
-		if ((h->efr_reg & BIT(IOMMU_EFR_XTSUP_SHIFT)) &&
-		    (h->efr_reg & BIT(IOMMU_EFR_MSICAPMMIOSUP_SHIFT)))
+		if (h->efr_reg & BIT(IOMMU_EFR_XTSUP_SHIFT))
 			amd_iommu_xt_mode = IRQ_REMAP_X2APIC_MODE;
 		break;
 	default:
@@ -1981,27 +1975,32 @@ union intcapxt {
 } __attribute__ ((packed));
 
 /*
- * Setup the IntCapXT registers with interrupt routing information
- * based on the PCI MSI capability block registers, accessed via
- * MMIO MSI address low/hi and MSI data registers.
+ * There isn't really any need to mask/unmask at the irqchip level because
+ * the 64-bit INTCAPXT registers can be updated atomically without tearing
+ * when the affinity is being updated.
  */
-static void iommu_update_intcapxt(struct amd_iommu *iommu)
+static void intcapxt_unmask_irq(struct irq_data *data)
 {
-	struct msi_msg msg;
-	union intcapxt xt;
-	u32 destid;
+}
 
-	msg.address_lo = readl(iommu->mmio_base + MMIO_MSI_ADDR_LO_OFFSET);
-	msg.address_hi = readl(iommu->mmio_base + MMIO_MSI_ADDR_HI_OFFSET);
-	msg.data = readl(iommu->mmio_base + MMIO_MSI_DATA_OFFSET);
+static void intcapxt_mask_irq(struct irq_data *data)
+{
+}
 
-	destid = x86_msi_msg_get_destid(&msg, x2apic_enabled());
+static struct irq_chip intcapxt_controller;
+
+static int intcapxt_irqdomain_activate(struct irq_domain *domain,
+				       struct irq_data *irqd, bool reserve)
+{
+	struct amd_iommu *iommu = irqd->chip_data;
+	struct irq_cfg *cfg = irqd_cfg(irqd);
+	union intcapxt xt;
 
 	xt.capxt = 0ULL;
-	xt.dest_mode_logical = msg.arch_data.dest_mode_logical;
-	xt.vector = msg.arch_data.vector;
-	xt.destid_0_23 = destid & GENMASK(23, 0);
-	xt.destid_24_31 = destid >> 24;
+	xt.dest_mode_logical = apic->dest_mode_logical;
+	xt.vector = cfg->vector;
+	xt.destid_0_23 = cfg->dest_apicid & GENMASK(23, 0);
+	xt.destid_24_31 = cfg->dest_apicid >> 24;
 
 	/**
 	 * Current IOMMU implemtation uses the same IRQ for all
@@ -2010,64 +2009,141 @@ static void iommu_update_intcapxt(struct amd_iommu *iommu)
 	writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_EVT_OFFSET);
 	writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_PPR_OFFSET);
 	writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_GALOG_OFFSET);
+	return 0;
 }
 
-static void _irq_notifier_notify(struct irq_affinity_notify *notify,
-				 const cpumask_t *mask)
+static void intcapxt_irqdomain_deactivate(struct irq_domain *domain,
+					  struct irq_data *irqd)
 {
-	struct amd_iommu *iommu;
+	intcapxt_mask_irq(irqd);
+}
 
-	for_each_iommu(iommu) {
-		if (iommu->dev->irq == notify->irq) {
-			iommu_update_intcapxt(iommu);
-			break;
-		}
+
+static int intcapxt_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
+				    unsigned int nr_irqs, void *arg)
+{
+	struct irq_alloc_info *info = arg;
+	int i, ret;
+
+	if (!info || info->type != X86_IRQ_ALLOC_TYPE_AMDVI)
+		return -EINVAL;
+
+	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
+	if (ret < 0)
+		return ret;
+
+	for (i = virq; i < virq + nr_irqs; i++) {
+		struct irq_data *irqd = irq_domain_get_irq_data(domain, i);
+
+		irqd->chip = &intcapxt_controller;
+		irqd->chip_data = info->data;
 	}
+
+	return ret;
 }
 
-static void _irq_notifier_release(struct kref *ref)
+static void intcapxt_irqdomain_free(struct irq_domain *domain, unsigned int virq,
+				    unsigned int nr_irqs)
 {
+	irq_domain_free_irqs_top(domain, virq, nr_irqs);
 }
 
-static int iommu_init_intcapxt(struct amd_iommu *iommu)
+static int intcapxt_set_affinity(struct irq_data *irqd,
+				 const struct cpumask *mask, bool force)
 {
+	struct irq_data *parent = irqd->parent_data;
 	int ret;
-	struct irq_affinity_notify *notify = &iommu->intcapxt_notify;
 
-	/**
-	 * IntCapXT requires XTSup=1 and MsiCapMmioSup=1,
-	 * which can be inferred from amd_iommu_xt_mode.
-	 */
-	if (amd_iommu_xt_mode != IRQ_REMAP_X2APIC_MODE)
-		return 0;
+	ret = parent->chip->irq_set_affinity(parent, mask, force);
+	if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
+		return ret;
 
-	/**
-	 * Also, we need to setup notifier to update the IntCapXT registers
-	 * whenever the irq affinity is changed from user-space.
-	 */
-	notify->irq = iommu->dev->irq;
-	notify->notify = _irq_notifier_notify,
-	notify->release = _irq_notifier_release,
-	ret = irq_set_affinity_notifier(iommu->dev->irq, notify);
+	return intcapxt_irqdomain_activate(irqd->domain, irqd, false);
+}
+
+static struct irq_chip intcapxt_controller = {
+	.name			= "IOMMU-MSI",
+	.irq_unmask		= intcapxt_unmask_irq,
+	.irq_mask		= intcapxt_mask_irq,
+	.irq_ack		= irq_chip_ack_parent,
+	.irq_retrigger		= irq_chip_retrigger_hierarchy,
+	.irq_set_affinity       = intcapxt_set_affinity,
+	.flags			= IRQCHIP_SKIP_SET_WAKE,
+};
+
+static const struct irq_domain_ops intcapxt_domain_ops = {
+	.alloc			= intcapxt_irqdomain_alloc,
+	.free			= intcapxt_irqdomain_free,
+	.activate		= intcapxt_irqdomain_activate,
+	.deactivate		= intcapxt_irqdomain_deactivate,
+};
+
+
+static struct irq_domain *iommu_irqdomain;
+
+static struct irq_domain *iommu_get_irqdomain(void)
+{
+	struct fwnode_handle *fn;
+
+	/* No need for locking here (yet) as the init is single-threaded */
+	if (iommu_irqdomain)
+		return iommu_irqdomain;
+
+	fn = irq_domain_alloc_named_fwnode("AMD-Vi-MSI");
+	if (!fn)
+		return NULL;
+
+	iommu_irqdomain = irq_domain_create_hierarchy(x86_vector_domain, 0, 0,
+						      fn, &intcapxt_domain_ops,
+						      NULL);
+	if (!iommu_irqdomain)
+		irq_domain_free_fwnode(fn);
+
+	return iommu_irqdomain;
+}
+
+static int iommu_setup_intcapxt(struct amd_iommu *iommu)
+{
+	struct irq_domain *domain;
+	struct irq_alloc_info info;
+	int irq, ret;
+
+	domain = iommu_get_irqdomain();
+	if (!domain)
+		return -ENXIO;
+
+	init_irq_alloc_info(&info, NULL);
+	info.type = X86_IRQ_ALLOC_TYPE_AMDVI;
+	info.data = iommu;
+
+	irq = irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, &info);
+	if (irq < 0) {
+		irq_domain_remove(domain);
+		return irq;
+	}
+
+	ret = request_threaded_irq(irq, amd_iommu_int_handler,
+				   amd_iommu_int_thread, 0, "AMD-Vi", iommu);
 	if (ret) {
-		pr_err("Failed to register irq affinity notifier (devid=%#x, irq %d)\n",
-		       iommu->devid, iommu->dev->irq);
+		irq_domain_free_irqs(irq, 1);
+		irq_domain_remove(domain);
 		return ret;
 	}
 
-	iommu_update_intcapxt(iommu);
 	iommu_feature_enable(iommu, CONTROL_INTCAPXT_EN);
-	return ret;
+	return 0;
 }
 
-static int iommu_init_msi(struct amd_iommu *iommu)
+static int iommu_init_irq(struct amd_iommu *iommu)
 {
 	int ret;
 
 	if (iommu->int_enabled)
 		goto enable_faults;
 
-	if (iommu->dev->msi_cap)
+	if (amd_iommu_xt_mode == IRQ_REMAP_X2APIC_MODE)
+		ret = iommu_setup_intcapxt(iommu);
+	else if (iommu->dev->msi_cap)
 		ret = iommu_setup_msi(iommu);
 	else
 		ret = -ENODEV;
@@ -2076,10 +2152,6 @@ static int iommu_init_msi(struct amd_iommu *iommu)
 		return ret;
 
 enable_faults:
-	ret = iommu_init_intcapxt(iommu);
-	if (ret)
-		return ret;
-
 	iommu_feature_enable(iommu, CONTROL_EVT_INT_EN);
 
 	if (iommu->ppr_log != NULL)
@@ -2702,7 +2774,7 @@ static int amd_iommu_enable_interrupts(void)
 	int ret = 0;
 
 	for_each_iommu(iommu) {
-		ret = iommu_init_msi(iommu);
+		ret = iommu_init_irq(iommu);
 		if (ret)
 			goto out;
 	}
-- 
2.17.1



[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* Re: [EXTERNAL] [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-13 15:14                                                               ` David Woodhouse
@ 2020-11-16 18:02                                                                 ` David Woodhouse
  2020-11-17  2:00                                                                 ` Suravee Suthikulpanit
  2020-11-18 20:16                                                                 ` [tip: x86/apic] iommu/amd: Fix IOMMU interrupt generation in X2APIC mode tip-bot2 for David Woodhouse
  2 siblings, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-11-16 18:02 UTC (permalink / raw)
  To: Tom Lendacky, Thomas Gleixner, Borislav Petkov
  Cc: linux-kernel, x86, Qian Cai, Joerg Roedel, Suravee Suthikulpanit

[-- Attachment #1: Type: text/plain, Size: 766 bytes --]

On Fri, 2020-11-13 at 15:14 +0000, David Woodhouse wrote:
> On Wed, 2020-11-11 at 14:30 -0600, Tom Lendacky wrote:
> > I had trouble cloning your tree for some reason, so just took the top
> > three patches and applied them to the tip tree. This all appears to be
> > working. I'll let the IOMMU experts take a closer look (adding Suravee).
> 
> Thanks. I see Thomas has taken the first two into the tip.git x86/apic
> branch already, so we're just looking for an ack on the third. Which is
> this one...
> 
> From 49ee4fa51b8c06d14b7c4c74d15a7d76f865a8ea Mon Sep 17 00:00:00 2001
> From: David Woodhouse <dwmw@amazon.co.uk>
> Date: Wed, 11 Nov 2020 12:09:01 +0000
> Subject: [PATCH] iommu/amd: Fix IOMMU interrupt generation in X2APIC mode

Ping?


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [EXTERNAL] [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-13 15:14                                                               ` David Woodhouse
  2020-11-16 18:02                                                                 ` David Woodhouse
@ 2020-11-17  2:00                                                                 ` Suravee Suthikulpanit
  2020-11-18 10:29                                                                   ` Suravee Suthikulpanit
  2020-11-18 20:16                                                                 ` [tip: x86/apic] iommu/amd: Fix IOMMU interrupt generation in X2APIC mode tip-bot2 for David Woodhouse
  2 siblings, 1 reply; 192+ messages in thread
From: Suravee Suthikulpanit @ 2020-11-17  2:00 UTC (permalink / raw)
  To: David Woodhouse, Tom Lendacky, Thomas Gleixner, Borislav Petkov
  Cc: linux-kernel, x86, Qian Cai, Joerg Roedel

David,

On 11/13/20 10:14 PM, David Woodhouse wrote:
> On Wed, 2020-11-11 at 14:30 -0600, Tom Lendacky wrote:
>> I had trouble cloning your tree for some reason, so just took the top
>> three patches and applied them to the tip tree. This all appears to be
>> working. I'll let the IOMMU experts take a closer look (adding Suravee).
> 
> Thanks. I see Thomas has taken the first two into the tip.git x86/apic
> branch already, so we're just looking for an ack on the third. Which is
> this one...
> 
>  From 49ee4fa51b8c06d14b7c4c74d15a7d76f865a8ea Mon Sep 17 00:00:00 2001
> From: David Woodhouse <dwmw@amazon.co.uk>
> Date: Wed, 11 Nov 2020 12:09:01 +0000
> Subject: [PATCH] iommu/amd: Fix IOMMU interrupt generation in X2APIC mode
> 
> The AMD IOMMU has two modes for generating its own interrupts.
> 
> The first is very much based on PCI MSI, and can be configured by Linux
> precisely that way. But like legacy unmapped PCI MSI it's limited to
> 8 bits of APIC ID.
> 
> The second method does not use PCI MSI at all in hardawre, and instead
> configures the INTCAPXT registers in the IOMMU directly with the APIC ID
> and vector.
> 
> In the latter case, the IOMMU driver would still use pci_enable_msi(),
> read back (through MMIO) the MSI message that Linux wrote to the PCI MSI
> table, then swizzle those bits into the appropriate register.
> 
> Historically, this worked because__irq_compose_msi_msg() would silently
> generate an invalid MSI message with the high bits of the APIC ID in the
> high bits of the MSI address. That hack was intended only for the Intel
> IOMMU, and I recently enforced that, introducing a warning in
> __irq_msi_compose_msg() if it was invoked with an APIC ID above 255.
> 
> Fix the AMD IOMMU not to depend on that hack any more, by having its own
> irqdomain and directly putting the bits from the irq_cfg into the right
> place in its ->activate() method.
> 
> Fixes: 47bea873cf80 "x86/msi: Only use high bits of MSI address for DMAR unit")
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

I'm still working on testing this series using IO_PAGE_FAULT injection to trigger the IOMMU interrupts. I am still 
debugging some issues, and I'll keep you updated on the findings.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [EXTERNAL] [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-17  2:00                                                                 ` Suravee Suthikulpanit
@ 2020-11-18 10:29                                                                   ` Suravee Suthikulpanit
  2020-11-18 10:52                                                                     ` David Woodhouse
  2020-11-18 14:06                                                                     ` Thomas Gleixner
  0 siblings, 2 replies; 192+ messages in thread
From: Suravee Suthikulpanit @ 2020-11-18 10:29 UTC (permalink / raw)
  To: David Woodhouse, Tom Lendacky, Thomas Gleixner, Borislav Petkov
  Cc: linux-kernel, x86, Qian Cai, Joerg Roedel

David

On 11/17/20 9:00 AM, Suravee Suthikulpanit wrote:
> David,
> 
> On 11/13/20 10:14 PM, David Woodhouse wrote:
>> On Wed, 2020-11-11 at 14:30 -0600, Tom Lendacky wrote:
>>> I had trouble cloning your tree for some reason, so just took the top
>>> three patches and applied them to the tip tree. This all appears to be
>>> working. I'll let the IOMMU experts take a closer look (adding Suravee).
>>
>> Thanks. I see Thomas has taken the first two into the tip.git x86/apic
>> branch already, so we're just looking for an ack on the third. Which is
>> this one...
>>
>>  From 49ee4fa51b8c06d14b7c4c74d15a7d76f865a8ea Mon Sep 17 00:00:00 2001
>> From: David Woodhouse <dwmw@amazon.co.uk>
>> Date: Wed, 11 Nov 2020 12:09:01 +0000
>> Subject: [PATCH] iommu/amd: Fix IOMMU interrupt generation in X2APIC mode
>>
>> The AMD IOMMU has two modes for generating its own interrupts.
>>
>> The first is very much based on PCI MSI, and can be configured by Linux
>> precisely that way. But like legacy unmapped PCI MSI it's limited to
>> 8 bits of APIC ID.
>>
>> The second method does not use PCI MSI at all in hardawre, and instead
>> configures the INTCAPXT registers in the IOMMU directly with the APIC ID
>> and vector.
>>
>> In the latter case, the IOMMU driver would still use pci_enable_msi(),
>> read back (through MMIO) the MSI message that Linux wrote to the PCI MSI
>> table, then swizzle those bits into the appropriate register.
>>
>> Historically, this worked because__irq_compose_msi_msg() would silently
>> generate an invalid MSI message with the high bits of the APIC ID in the
>> high bits of the MSI address. That hack was intended only for the Intel
>> IOMMU, and I recently enforced that, introducing a warning in
>> __irq_msi_compose_msg() if it was invoked with an APIC ID above 255.
>>
>> Fix the AMD IOMMU not to depend on that hack any more, by having its own
>> irqdomain and directly putting the bits from the irq_cfg into the right
>> place in its ->activate() method.
>>
>> Fixes: 47bea873cf80 "x86/msi: Only use high bits of MSI address for DMAR unit")
>> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> 
> I'm still working on testing this series using IO_PAGE_FAULT injection to trigger the IOMMU interrupts. I am still 
> debugging some issues, and I'll keep you updated on the findings.
> 
> Thanks,
> Suravee

I might need your help debugging this issue. I'm seeing the following error:

[   14.005937] irq 29, desc: 00000000d200500b, depth: 0, count: 0, unhandled: 0
[   14.006234] ->handle_irq():  00000000eab4b6eb, handle_bad_irq+0x0/0x230
[   14.006234] ->irq_data.chip(): 000000001cce6d6b, intcapxt_controller+0x0/0x120
[   14.006234] ->action(): 0000000083bfd734
[   14.006234] ->action->handler(): 0000000094806345, amd_iommu_int_handler+0x0/0x10
[   14.006234] unexpected IRQ trap at vector 1d

Do you have any idea what might have gone wrong here?

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [EXTERNAL] [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-18 10:29                                                                   ` Suravee Suthikulpanit
@ 2020-11-18 10:52                                                                     ` David Woodhouse
  2020-11-18 14:06                                                                     ` Thomas Gleixner
  1 sibling, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-11-18 10:52 UTC (permalink / raw)
  To: Suravee Suthikulpanit, Tom Lendacky, Thomas Gleixner, Borislav Petkov
  Cc: linux-kernel, x86, Qian Cai, Joerg Roedel

[-- Attachment #1: Type: text/plain, Size: 878 bytes --]

On Wed, 2020-11-18 at 17:29 +0700, Suravee Suthikulpanit wrote:
> I might need your help debugging this issue. I'm seeing the following error:
> 
> [   14.005937] irq 29, desc: 00000000d200500b, depth: 0, count: 0, unhandled: 0
> [   14.006234] ->handle_irq():  00000000eab4b6eb, handle_bad_irq+0x0/0x230

Where's that line coming from?

> [   14.006234] ->irq_data.chip(): 000000001cce6d6b, intcapxt_controller+0x0/0x120
> [   14.006234] ->action(): 0000000083bfd734
> [   14.006234] ->action->handler(): 0000000094806345, amd_iommu_int_handler+0x0/0x10
> [   14.006234] unexpected IRQ trap at vector 1d
> 
> Do you have any idea what might have gone wrong here?

Hm, vector 0x1d is vector 29. It's also IRQ 29. I wonder if that's a coincidence.

Can you make intcapxt_irqdomain_activate() print the 64-bit value that
it's writing to the intcapxt registers?

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [EXTERNAL] [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-18 10:29                                                                   ` Suravee Suthikulpanit
  2020-11-18 10:52                                                                     ` David Woodhouse
@ 2020-11-18 14:06                                                                     ` Thomas Gleixner
  2020-11-18 16:51                                                                       ` Suravee Suthikulpanit
  1 sibling, 1 reply; 192+ messages in thread
From: Thomas Gleixner @ 2020-11-18 14:06 UTC (permalink / raw)
  To: Suravee Suthikulpanit, David Woodhouse, Tom Lendacky, Borislav Petkov
  Cc: linux-kernel, x86, Qian Cai, Joerg Roedel

Suravee,

On Wed, Nov 18 2020 at 17:29, Suravee Suthikulpanit wrote:
> On 11/17/20 9:00 AM, Suravee Suthikulpanit wrote:
>
> I might need your help debugging this issue. I'm seeing the following error:
>
> [   14.005937] irq 29, desc: 00000000d200500b, depth: 0, count: 0, unhandled: 0
> [   14.006234] ->handle_irq():  00000000eab4b6eb, handle_bad_irq+0x0/0x230
> [   14.006234] ->irq_data.chip(): 000000001cce6d6b, intcapxt_controller+0x0/0x120
> [   14.006234] ->action(): 0000000083bfd734
> [   14.006234] ->action->handler(): 0000000094806345, amd_iommu_int_handler+0x0/0x10
> [   14.006234] unexpected IRQ trap at vector 1d
>
> Do you have any idea what might have gone wrong here?

Yes. This lacks setting up the low level flow handler. Delta patch
below.

Thanks,

        tglx
---
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -2033,6 +2033,7 @@ static int intcapxt_irqdomain_alloc(stru
 
 		irqd->chip = &intcapxt_controller;
 		irqd->chip_data = info->data;
+		__irq_set_handler(i, handle_edge_irq, 0, "edge");
 	}
 
 	return ret;

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [EXTERNAL] [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-18 14:06                                                                     ` Thomas Gleixner
@ 2020-11-18 16:51                                                                       ` Suravee Suthikulpanit
  2020-11-18 17:08                                                                         ` David Woodhouse
  0 siblings, 1 reply; 192+ messages in thread
From: Suravee Suthikulpanit @ 2020-11-18 16:51 UTC (permalink / raw)
  To: Thomas Gleixner, David Woodhouse, Tom Lendacky, Borislav Petkov
  Cc: linux-kernel, x86, Qian Cai, Joerg Roedel

Tglx,

On 11/18/20 9:06 PM, Thomas Gleixner wrote:
> Suravee,
> 
> On Wed, Nov 18 2020 at 17:29, Suravee Suthikulpanit wrote:
>> On 11/17/20 9:00 AM, Suravee Suthikulpanit wrote:
>>
>> I might need your help debugging this issue. I'm seeing the following error:
>>
>> [   14.005937] irq 29, desc: 00000000d200500b, depth: 0, count: 0, unhandled: 0
>> [   14.006234] ->handle_irq():  00000000eab4b6eb, handle_bad_irq+0x0/0x230
>> [   14.006234] ->irq_data.chip(): 000000001cce6d6b, intcapxt_controller+0x0/0x120
>> [   14.006234] ->action(): 0000000083bfd734
>> [   14.006234] ->action->handler(): 0000000094806345, amd_iommu_int_handler+0x0/0x10
>> [   14.006234] unexpected IRQ trap at vector 1d
>>
>> Do you have any idea what might have gone wrong here?
> 
> Yes. This lacks setting up the low level flow handler. Delta patch
> below.
> 
> Thanks,
> 
>          tglx
> ---
> --- a/drivers/iommu/amd/init.c
> +++ b/drivers/iommu/amd/init.c
> @@ -2033,6 +2033,7 @@ static int intcapxt_irqdomain_alloc(stru
>   
>   		irqd->chip = &intcapxt_controller;
>   		irqd->chip_data = info->data;
> +		__irq_set_handler(i, handle_edge_irq, 0, "edge");
>   	}
>   
>   	return ret;
> 

Yes, this fixes the issue. Now I can receive the IOMMU event log interrupts for IO_PAGE_FAULT event, which is triggered 
using the injection interface via debugfs.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [EXTERNAL] [tip: x86/apic] x86/io_apic: Cleanup trigger/polarity helpers
  2020-11-18 16:51                                                                       ` Suravee Suthikulpanit
@ 2020-11-18 17:08                                                                         ` David Woodhouse
  0 siblings, 0 replies; 192+ messages in thread
From: David Woodhouse @ 2020-11-18 17:08 UTC (permalink / raw)
  To: Suravee Suthikulpanit, Thomas Gleixner, Tom Lendacky, Borislav Petkov
  Cc: linux-kernel, x86, Qian Cai, Joerg Roedel

[-- Attachment #1: Type: text/plain, Size: 258 bytes --]

On Wed, 2020-11-18 at 23:51 +0700, Suravee Suthikulpanit wrote:
> Yes, this fixes the issue. Now I can receive the IOMMU event log
> interrupts for IO_PAGE_FAULT event, which is triggered 
> using the injection interface via debugfs.

Thanks, Suravee.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 192+ messages in thread

* [tip: x86/apic] iommu/amd: Fix IOMMU interrupt generation in X2APIC mode
  2020-11-13 15:14                                                               ` David Woodhouse
  2020-11-16 18:02                                                                 ` David Woodhouse
  2020-11-17  2:00                                                                 ` Suravee Suthikulpanit
@ 2020-11-18 20:16                                                                 ` tip-bot2 for David Woodhouse
  2 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for David Woodhouse @ 2020-11-18 20:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: David Woodhouse, Thomas Gleixner, Suravee Suthikulpanit, x86,
	linux-kernel

The following commit has been merged into the x86/apic branch of tip:

Commit-ID:     d1adcfbb520c43c10fc22fcdccdd4204e014fb53
Gitweb:        https://git.kernel.org/tip/d1adcfbb520c43c10fc22fcdccdd4204e014fb53
Author:        David Woodhouse <dwmw@amazon.co.uk>
AuthorDate:    Wed, 11 Nov 2020 12:09:01 
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 18 Nov 2020 20:55:59 +01:00

iommu/amd: Fix IOMMU interrupt generation in X2APIC mode

The AMD IOMMU has two modes for generating its own interrupts.

The first is very much based on PCI MSI, and can be configured by Linux
precisely that way. But like legacy unmapped PCI MSI it's limited to
8 bits of APIC ID.

The second method does not use PCI MSI at all in hardawre, and instead
configures the INTCAPXT registers in the IOMMU directly with the APIC ID
and vector.

In the latter case, the IOMMU driver would still use pci_enable_msi(),
read back (through MMIO) the MSI message that Linux wrote to the PCI MSI
table, then swizzle those bits into the appropriate register.

Historically, this worked because__irq_compose_msi_msg() would silently
generate an invalid MSI message with the high bits of the APIC ID in the
high bits of the MSI address. That hack was intended only for the Intel
IOMMU, and I recently enforced that, introducing a warning in
__irq_msi_compose_msg() if it was invoked with an APIC ID above 255.

Fix the AMD IOMMU not to depend on that hack any more, by having its own
irqdomain and directly putting the bits from the irq_cfg into the right
place in its ->activate() method.

Fixes: 47bea873cf80 "x86/msi: Only use high bits of MSI address for DMAR unit")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/05e3a5ba317f5ff48d2f8356f19e617f8b9d23a4.camel@infradead.org
---
 arch/x86/include/asm/hw_irq.h |   1 +-
 drivers/iommu/amd/init.c      | 191 ++++++++++++++++++++++-----------
 2 files changed, 133 insertions(+), 59 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 458f5a6..d465ece 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -39,6 +39,7 @@ enum irq_alloc_type {
 	X86_IRQ_ALLOC_TYPE_PCI_MSI,
 	X86_IRQ_ALLOC_TYPE_PCI_MSIX,
 	X86_IRQ_ALLOC_TYPE_DMAR,
+	X86_IRQ_ALLOC_TYPE_AMDVI,
 	X86_IRQ_ALLOC_TYPE_UV,
 };
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index a94b96f..07d1f99 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -16,6 +16,7 @@
 #include <linux/syscore_ops.h>
 #include <linux/interrupt.h>
 #include <linux/msi.h>
+#include <linux/irq.h>
 #include <linux/amd-iommu.h>
 #include <linux/export.h>
 #include <linux/kmemleak.h>
@@ -1557,14 +1558,7 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h)
 			break;
 		}
 
-		/*
-		 * Note: Since iommu_update_intcapxt() leverages
-		 * the IOMMU MMIO access to MSI capability block registers
-		 * for MSI address lo/hi/data, we need to check both
-		 * EFR[XtSup] and EFR[MsiCapMmioSup] for x2APIC support.
-		 */
-		if ((h->efr_reg & BIT(IOMMU_EFR_XTSUP_SHIFT)) &&
-		    (h->efr_reg & BIT(IOMMU_EFR_MSICAPMMIOSUP_SHIFT)))
+		if (h->efr_reg & BIT(IOMMU_EFR_XTSUP_SHIFT))
 			amd_iommu_xt_mode = IRQ_REMAP_X2APIC_MODE;
 		break;
 	default:
@@ -1981,27 +1975,32 @@ union intcapxt {
 } __attribute__ ((packed));
 
 /*
- * Setup the IntCapXT registers with interrupt routing information
- * based on the PCI MSI capability block registers, accessed via
- * MMIO MSI address low/hi and MSI data registers.
+ * There isn't really any need to mask/unmask at the irqchip level because
+ * the 64-bit INTCAPXT registers can be updated atomically without tearing
+ * when the affinity is being updated.
  */
-static void iommu_update_intcapxt(struct amd_iommu *iommu)
+static void intcapxt_unmask_irq(struct irq_data *data)
 {
-	struct msi_msg msg;
-	union intcapxt xt;
-	u32 destid;
+}
 
-	msg.address_lo = readl(iommu->mmio_base + MMIO_MSI_ADDR_LO_OFFSET);
-	msg.address_hi = readl(iommu->mmio_base + MMIO_MSI_ADDR_HI_OFFSET);
-	msg.data = readl(iommu->mmio_base + MMIO_MSI_DATA_OFFSET);
+static void intcapxt_mask_irq(struct irq_data *data)
+{
+}
 
-	destid = x86_msi_msg_get_destid(&msg, x2apic_enabled());
+static struct irq_chip intcapxt_controller;
+
+static int intcapxt_irqdomain_activate(struct irq_domain *domain,
+				       struct irq_data *irqd, bool reserve)
+{
+	struct amd_iommu *iommu = irqd->chip_data;
+	struct irq_cfg *cfg = irqd_cfg(irqd);
+	union intcapxt xt;
 
 	xt.capxt = 0ULL;
-	xt.dest_mode_logical = msg.arch_data.dest_mode_logical;
-	xt.vector = msg.arch_data.vector;
-	xt.destid_0_23 = destid & GENMASK(23, 0);
-	xt.destid_24_31 = destid >> 24;
+	xt.dest_mode_logical = apic->dest_mode_logical;
+	xt.vector = cfg->vector;
+	xt.destid_0_23 = cfg->dest_apicid & GENMASK(23, 0);
+	xt.destid_24_31 = cfg->dest_apicid >> 24;
 
 	/**
 	 * Current IOMMU implemtation uses the same IRQ for all
@@ -2010,64 +2009,142 @@ static void iommu_update_intcapxt(struct amd_iommu *iommu)
 	writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_EVT_OFFSET);
 	writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_PPR_OFFSET);
 	writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_GALOG_OFFSET);
+	return 0;
 }
 
-static void _irq_notifier_notify(struct irq_affinity_notify *notify,
-				 const cpumask_t *mask)
+static void intcapxt_irqdomain_deactivate(struct irq_domain *domain,
+					  struct irq_data *irqd)
 {
-	struct amd_iommu *iommu;
+	intcapxt_mask_irq(irqd);
+}
 
-	for_each_iommu(iommu) {
-		if (iommu->dev->irq == notify->irq) {
-			iommu_update_intcapxt(iommu);
-			break;
-		}
+
+static int intcapxt_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
+				    unsigned int nr_irqs, void *arg)
+{
+	struct irq_alloc_info *info = arg;
+	int i, ret;
+
+	if (!info || info->type != X86_IRQ_ALLOC_TYPE_AMDVI)
+		return -EINVAL;
+
+	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
+	if (ret < 0)
+		return ret;
+
+	for (i = virq; i < virq + nr_irqs; i++) {
+		struct irq_data *irqd = irq_domain_get_irq_data(domain, i);
+
+		irqd->chip = &intcapxt_controller;
+		irqd->chip_data = info->data;
+		__irq_set_handler(i, handle_edge_irq, 0, "edge");
 	}
+
+	return ret;
 }
 
-static void _irq_notifier_release(struct kref *ref)
+static void intcapxt_irqdomain_free(struct irq_domain *domain, unsigned int virq,
+				    unsigned int nr_irqs)
 {
+	irq_domain_free_irqs_top(domain, virq, nr_irqs);
 }
 
-static int iommu_init_intcapxt(struct amd_iommu *iommu)
+static int intcapxt_set_affinity(struct irq_data *irqd,
+				 const struct cpumask *mask, bool force)
 {
+	struct irq_data *parent = irqd->parent_data;
 	int ret;
-	struct irq_affinity_notify *notify = &iommu->intcapxt_notify;
 
-	/**
-	 * IntCapXT requires XTSup=1 and MsiCapMmioSup=1,
-	 * which can be inferred from amd_iommu_xt_mode.
-	 */
-	if (amd_iommu_xt_mode != IRQ_REMAP_X2APIC_MODE)
-		return 0;
+	ret = parent->chip->irq_set_affinity(parent, mask, force);
+	if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
+		return ret;
 
-	/**
-	 * Also, we need to setup notifier to update the IntCapXT registers
-	 * whenever the irq affinity is changed from user-space.
-	 */
-	notify->irq = iommu->dev->irq;
-	notify->notify = _irq_notifier_notify,
-	notify->release = _irq_notifier_release,
-	ret = irq_set_affinity_notifier(iommu->dev->irq, notify);
+	return intcapxt_irqdomain_activate(irqd->domain, irqd, false);
+}
+
+static struct irq_chip intcapxt_controller = {
+	.name			= "IOMMU-MSI",
+	.irq_unmask		= intcapxt_unmask_irq,
+	.irq_mask		= intcapxt_mask_irq,
+	.irq_ack		= irq_chip_ack_parent,
+	.irq_retrigger		= irq_chip_retrigger_hierarchy,
+	.irq_set_affinity       = intcapxt_set_affinity,
+	.flags			= IRQCHIP_SKIP_SET_WAKE,
+};
+
+static const struct irq_domain_ops intcapxt_domain_ops = {
+	.alloc			= intcapxt_irqdomain_alloc,
+	.free			= intcapxt_irqdomain_free,
+	.activate		= intcapxt_irqdomain_activate,
+	.deactivate		= intcapxt_irqdomain_deactivate,
+};
+
+
+static struct irq_domain *iommu_irqdomain;
+
+static struct irq_domain *iommu_get_irqdomain(void)
+{
+	struct fwnode_handle *fn;
+
+	/* No need for locking here (yet) as the init is single-threaded */
+	if (iommu_irqdomain)
+		return iommu_irqdomain;
+
+	fn = irq_domain_alloc_named_fwnode("AMD-Vi-MSI");
+	if (!fn)
+		return NULL;
+
+	iommu_irqdomain = irq_domain_create_hierarchy(x86_vector_domain, 0, 0,
+						      fn, &intcapxt_domain_ops,
+						      NULL);
+	if (!iommu_irqdomain)
+		irq_domain_free_fwnode(fn);
+
+	return iommu_irqdomain;
+}
+
+static int iommu_setup_intcapxt(struct amd_iommu *iommu)
+{
+	struct irq_domain *domain;
+	struct irq_alloc_info info;
+	int irq, ret;
+
+	domain = iommu_get_irqdomain();
+	if (!domain)
+		return -ENXIO;
+
+	init_irq_alloc_info(&info, NULL);
+	info.type = X86_IRQ_ALLOC_TYPE_AMDVI;
+	info.data = iommu;
+
+	irq = irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, &info);
+	if (irq < 0) {
+		irq_domain_remove(domain);
+		return irq;
+	}
+
+	ret = request_threaded_irq(irq, amd_iommu_int_handler,
+				   amd_iommu_int_thread, 0, "AMD-Vi", iommu);
 	if (ret) {
-		pr_err("Failed to register irq affinity notifier (devid=%#x, irq %d)\n",
-		       iommu->devid, iommu->dev->irq);
+		irq_domain_free_irqs(irq, 1);
+		irq_domain_remove(domain);
 		return ret;
 	}
 
-	iommu_update_intcapxt(iommu);
 	iommu_feature_enable(iommu, CONTROL_INTCAPXT_EN);
-	return ret;
+	return 0;
 }
 
-static int iommu_init_msi(struct amd_iommu *iommu)
+static int iommu_init_irq(struct amd_iommu *iommu)
 {
 	int ret;
 
 	if (iommu->int_enabled)
 		goto enable_faults;
 
-	if (iommu->dev->msi_cap)
+	if (amd_iommu_xt_mode == IRQ_REMAP_X2APIC_MODE)
+		ret = iommu_setup_intcapxt(iommu);
+	else if (iommu->dev->msi_cap)
 		ret = iommu_setup_msi(iommu);
 	else
 		ret = -ENODEV;
@@ -2076,10 +2153,6 @@ static int iommu_init_msi(struct amd_iommu *iommu)
 		return ret;
 
 enable_faults:
-	ret = iommu_init_intcapxt(iommu);
-	if (ret)
-		return ret;
-
 	iommu_feature_enable(iommu, CONTROL_EVT_INT_EN);
 
 	if (iommu->ppr_log != NULL)
@@ -2702,7 +2775,7 @@ static int amd_iommu_enable_interrupts(void)
 	int ret = 0;
 
 	for_each_iommu(iommu) {
-		ret = iommu_init_msi(iommu);
+		ret = iommu_init_irq(iommu);
 		if (ret)
 			goto out;
 	}

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 12/35] x86/msi: Provide msi message shadow structs
  2020-10-24 21:35                     ` [PATCH v3 12/35] x86/msi: Provide msi message shadow structs David Woodhouse
  2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
@ 2022-04-06  8:36                       ` Reto Buerki
  2022-04-06  8:36                         ` [PATCH] x86/msi: Fix msi message data shadow struct Reto Buerki
  2022-04-06 22:07                         ` [PATCH v3 12/35] x86/msi: Provide msi message shadow structs Thomas Gleixner
  1 sibling, 2 replies; 192+ messages in thread
From: Reto Buerki @ 2022-04-06  8:36 UTC (permalink / raw)
  To: dwmw2
  Cc: x86, kvm, iommu, joro, tglx, pbonzini, linux-kernel,
	linux-hyperv, maz, decui

While working on some out-of-tree patches, we noticed that assignment to
dmar_subhandle of struct x86_msi_data lead to a QEMU warning about
reserved bits in MSI data being set:

qemu-system-x86_64: vtd_interrupt_remap_msi: invalid IR MSI (sid=256, address=0xfee003d8, data=0x10000)

This message originates from hw/i386/intel_iommu.c in QEMU:

#define VTD_IR_MSI_DATA_RESERVED (0xffff0000)
if (origin->data & VTD_IR_MSI_DATA_RESERVED) { ... }

Looking at struct x86_msi_data, it appears that it is actually 48-bits in size
since the bitfield containing the vector, delivery_mode etc is 2 bytes wide
followed by dmar_subhandle which is 32 bits. Thus assignment to dmar_subhandle
leads to bits > 16 being set.

If I am not mistaken, the MSI data field should be 32-bits wide for all
platforms (struct msi_msg, include/linux/msi.h). Is this analysis
correct or did I miss something wrt. handling of dmar_subhandle?

The attached patch fixes the issue for us.

Regards,
- reto

^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH] x86/msi: Fix msi message data shadow struct
  2022-04-06  8:36                       ` [PATCH v3 12/35] " Reto Buerki
@ 2022-04-06  8:36                         ` Reto Buerki
  2022-04-06 22:11                           ` Thomas Gleixner
  2022-04-06 22:07                         ` [PATCH v3 12/35] x86/msi: Provide msi message shadow structs Thomas Gleixner
  1 sibling, 1 reply; 192+ messages in thread
From: Reto Buerki @ 2022-04-06  8:36 UTC (permalink / raw)
  To: dwmw2
  Cc: x86, kvm, iommu, joro, tglx, pbonzini, linux-kernel,
	linux-hyperv, maz, decui

The x86 MSI message data is 32 bits in total and is either in
compatibility or remappable format, see Intel Virtualization Technology
for Directed I/O, section 5.1.2.

Fixes: 6285aa50736 ("x86/msi: Provide msi message shadow structs")
Signed-off-by: Reto Buerki <reet@codelabs.ch>
Signed-off-by: Adrian-Ken Rueegsegger <ken@codelabs.ch>
---
 arch/x86/include/asm/msi.h | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/msi.h b/arch/x86/include/asm/msi.h
index b85147d75626..d71c7e8b738d 100644
--- a/arch/x86/include/asm/msi.h
+++ b/arch/x86/include/asm/msi.h
@@ -12,14 +12,17 @@ int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
 /* Structs and defines for the X86 specific MSI message format */
 
 typedef struct x86_msi_data {
-	u32	vector			:  8,
-		delivery_mode		:  3,
-		dest_mode_logical	:  1,
-		reserved		:  2,
-		active_low		:  1,
-		is_level		:  1;
-
-	u32	dmar_subhandle;
+	union {
+		struct {
+			u32	vector			:  8,
+				delivery_mode		:  3,
+				dest_mode_logical	:  1,
+				reserved		:  2,
+				active_low		:  1,
+				is_level		:  1;
+		};
+		u32	dmar_subhandle;
+	};
 } __attribute__ ((packed)) arch_msi_msg_data_t;
 #define arch_msi_msg_data	x86_msi_data
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 12/35] x86/msi: Provide msi message shadow structs
  2022-04-06  8:36                       ` [PATCH v3 12/35] " Reto Buerki
  2022-04-06  8:36                         ` [PATCH] x86/msi: Fix msi message data shadow struct Reto Buerki
@ 2022-04-06 22:07                         ` Thomas Gleixner
  1 sibling, 0 replies; 192+ messages in thread
From: Thomas Gleixner @ 2022-04-06 22:07 UTC (permalink / raw)
  To: Reto Buerki, dwmw2
  Cc: x86, kvm, iommu, joro, pbonzini, linux-kernel, linux-hyperv, maz, decui

On Wed, Apr 06 2022 at 10:36, Reto Buerki wrote:
> While working on some out-of-tree patches, we noticed that assignment to
> dmar_subhandle of struct x86_msi_data lead to a QEMU warning about
> reserved bits in MSI data being set:
>
> qemu-system-x86_64: vtd_interrupt_remap_msi: invalid IR MSI (sid=256, address=0xfee003d8, data=0x10000)
>
> This message originates from hw/i386/intel_iommu.c in QEMU:
>
> #define VTD_IR_MSI_DATA_RESERVED (0xffff0000)
> if (origin->data & VTD_IR_MSI_DATA_RESERVED) { ... }
>
> Looking at struct x86_msi_data, it appears that it is actually 48-bits in size
> since the bitfield containing the vector, delivery_mode etc is 2 bytes wide
> followed by dmar_subhandle which is 32 bits. Thus assignment to dmar_subhandle
> leads to bits > 16 being set.
>
> If I am not mistaken, the MSI data field should be 32-bits wide for all
> platforms (struct msi_msg, include/linux/msi.h). Is this analysis
> correct or did I miss something wrt. handling of dmar_subhandle?

It's correct and I'm completely surprised that this went unnoticed
for more than a year. Where is that brown paperbag...

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH] x86/msi: Fix msi message data shadow struct
  2022-04-06  8:36                         ` [PATCH] x86/msi: Fix msi message data shadow struct Reto Buerki
@ 2022-04-06 22:11                           ` Thomas Gleixner
  2022-04-07 11:06                             ` Reto Buerki
  0 siblings, 1 reply; 192+ messages in thread
From: Thomas Gleixner @ 2022-04-06 22:11 UTC (permalink / raw)
  To: Reto Buerki, dwmw2
  Cc: x86, kvm, iommu, joro, pbonzini, linux-kernel, linux-hyperv, maz, decui

Reto,

On Wed, Apr 06 2022 at 10:36, Reto Buerki wrote:

> The x86 MSI message data is 32 bits in total and is either in
> compatibility or remappable format, see Intel Virtualization Technology
> for Directed I/O, section 5.1.2.
>
> Fixes: 6285aa50736 ("x86/msi: Provide msi message shadow structs")
> Signed-off-by: Reto Buerki <reet@codelabs.ch>
> Signed-off-by: Adrian-Ken Rueegsegger <ken@codelabs.ch>

This signed-off by chain is incorrect. It would be correct if Adrian-Ken
would have sent the patch.

If you both worked on that then please use the Co-developed-by tag
according to Documentation/process/

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH] x86/msi: Fix msi message data shadow struct
  2022-04-06 22:11                           ` Thomas Gleixner
@ 2022-04-07 11:06                             ` Reto Buerki
  2022-04-07 13:24                               ` [tip: x86/urgent] " tip-bot2 for Reto Buerki
  0 siblings, 1 reply; 192+ messages in thread
From: Reto Buerki @ 2022-04-07 11:06 UTC (permalink / raw)
  To: tglx, dwmw2
  Cc: x86, kvm, iommu, joro, pbonzini, linux-kernel, linux-hyperv, maz, decui

The x86 MSI message data is 32 bits in total and is either in
compatibility or remappable format, see Intel Virtualization Technology
for Directed I/O, section 5.1.2.

Fixes: 6285aa50736 ("x86/msi: Provide msi message shadow structs")
Co-developed-by: Adrian-Ken Rueegsegger <ken@codelabs.ch>
Signed-off-by: Adrian-Ken Rueegsegger <ken@codelabs.ch>
Signed-off-by: Reto Buerki <reet@codelabs.ch>
---
 arch/x86/include/asm/msi.h | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/msi.h b/arch/x86/include/asm/msi.h
index b85147d75626..d71c7e8b738d 100644
--- a/arch/x86/include/asm/msi.h
+++ b/arch/x86/include/asm/msi.h
@@ -12,14 +12,17 @@ int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
 /* Structs and defines for the X86 specific MSI message format */
 
 typedef struct x86_msi_data {
-	u32	vector			:  8,
-		delivery_mode		:  3,
-		dest_mode_logical	:  1,
-		reserved		:  2,
-		active_low		:  1,
-		is_level		:  1;
-
-	u32	dmar_subhandle;
+	union {
+		struct {
+			u32	vector			:  8,
+				delivery_mode		:  3,
+				dest_mode_logical	:  1,
+				reserved		:  2,
+				active_low		:  1,
+				is_level		:  1;
+		};
+		u32	dmar_subhandle;
+	};
 } __attribute__ ((packed)) arch_msi_msg_data_t;
 #define arch_msi_msg_data	x86_msi_data
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [tip: x86/urgent] x86/msi: Fix msi message data shadow struct
  2022-04-07 11:06                             ` Reto Buerki
@ 2022-04-07 13:24                               ` tip-bot2 for Reto Buerki
  0 siblings, 0 replies; 192+ messages in thread
From: tip-bot2 for Reto Buerki @ 2022-04-07 13:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Adrian-Ken Rueegsegger, Reto Buerki, Thomas Gleixner, stable,
	x86, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     59b18a1e65b7a2134814106d0860010e10babe18
Gitweb:        https://git.kernel.org/tip/59b18a1e65b7a2134814106d0860010e10babe18
Author:        Reto Buerki <reet@codelabs.ch>
AuthorDate:    Thu, 07 Apr 2022 13:06:47 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 07 Apr 2022 15:19:32 +02:00

x86/msi: Fix msi message data shadow struct

The x86 MSI message data is 32 bits in total and is either in
compatibility or remappable format, see Intel Virtualization Technology
for Directed I/O, section 5.1.2.

Fixes: 6285aa50736 ("x86/msi: Provide msi message shadow structs")
Co-developed-by: Adrian-Ken Rueegsegger <ken@codelabs.ch>
Signed-off-by: Adrian-Ken Rueegsegger <ken@codelabs.ch>
Signed-off-by: Reto Buerki <reet@codelabs.ch>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220407110647.67372-1-reet@codelabs.ch
---
 arch/x86/include/asm/msi.h | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/msi.h b/arch/x86/include/asm/msi.h
index b85147d..d71c7e8 100644
--- a/arch/x86/include/asm/msi.h
+++ b/arch/x86/include/asm/msi.h
@@ -12,14 +12,17 @@ int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
 /* Structs and defines for the X86 specific MSI message format */
 
 typedef struct x86_msi_data {
-	u32	vector			:  8,
-		delivery_mode		:  3,
-		dest_mode_logical	:  1,
-		reserved		:  2,
-		active_low		:  1,
-		is_level		:  1;
-
-	u32	dmar_subhandle;
+	union {
+		struct {
+			u32	vector			:  8,
+				delivery_mode		:  3,
+				dest_mode_logical	:  1,
+				reserved		:  2,
+				active_low		:  1,
+				is_level		:  1;
+		};
+		u32	dmar_subhandle;
+	};
 } __attribute__ ((packed)) arch_msi_msg_data_t;
 #define arch_msi_msg_data	x86_msi_data
 

^ permalink raw reply related	[flat|nested] 192+ messages in thread

end of thread, other threads:[~2022-04-07 13:24 UTC | newest]

Thread overview: 192+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-07 12:20 [PATCH 0/5] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
2020-10-07 12:20 ` [PATCH 1/5] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
2020-10-07 12:20   ` [PATCH 2/5] x86/msi: Only use high bits of MSI address for DMAR unit David Woodhouse
2020-10-07 12:20   ` [PATCH 3/5] x86/ioapic: Handle Extended Destination ID field in RTE David Woodhouse
2020-10-08  9:12     ` Peter Zijlstra
2020-10-08 17:05       ` David Woodhouse
2020-10-08 11:41     ` Thomas Gleixner
2020-10-07 12:20   ` [PATCH 4/5] x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available David Woodhouse
2020-10-08 11:54     ` Thomas Gleixner
2020-10-08 12:02       ` Thomas Gleixner
2020-10-08 13:00       ` David Woodhouse
2020-10-07 12:20   ` [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID David Woodhouse
2020-10-08 12:05     ` Thomas Gleixner
2020-10-08 12:55       ` David Woodhouse
2020-10-08 16:08         ` David Woodhouse
2020-10-08 21:14           ` Thomas Gleixner
2020-10-08 21:39             ` David Woodhouse
2020-10-08 23:27               ` Thomas Gleixner
2020-10-09  6:07                 ` David Woodhouse
2020-10-10 10:06                 ` David Woodhouse
2020-10-10 11:44                   ` Thomas Gleixner
2020-10-10 11:58                     ` David Woodhouse
2020-10-11 17:12                       ` Thomas Gleixner
2020-10-11 21:15                         ` David Woodhouse
2020-10-12  9:33                           ` Thomas Gleixner
2020-10-12 16:06                             ` David Woodhouse
2020-10-12 18:38                               ` Thomas Gleixner
2020-10-12 20:20                                 ` David Woodhouse
2020-10-12 22:13                                   ` Thomas Gleixner
2020-10-13  7:52                                     ` David Woodhouse
2020-10-13  8:11                                       ` [PATCH 0/9] Remove irq_remapping_get_irq_domain() David Woodhouse
2020-10-13  8:11                                         ` [PATCH 1/9] genirq/irqdomain: Implement get_name() method on irqchip fwnodes David Woodhouse
2020-10-13  8:11                                         ` [PATCH 2/9] x86/apic: Add select() method on vector irqdomain David Woodhouse
2020-10-13  8:11                                         ` [PATCH 3/9] iommu/amd: Implement select() method on remapping irqdomain David Woodhouse
2020-10-13  8:11                                         ` [PATCH 4/9] iommu/vt-d: " David Woodhouse
2020-10-13  8:11                                         ` [PATCH 5/9] iommu/hyper-v: " David Woodhouse
2020-10-13  8:11                                         ` [PATCH 6/9] x86/hpet: Use irq_find_matching_fwspec() to find " David Woodhouse
2020-10-13  8:11                                         ` [PATCH 7/9] x86/ioapic: " David Woodhouse
2020-10-13  8:11                                         ` [PATCH 8/9] x86: Kill all traces of irq_remapping_get_irq_domain() David Woodhouse
2020-10-13  8:11                                         ` [PATCH 9/9] iommu/vt-d: Simplify intel_irq_remapping_select() David Woodhouse
2020-10-13  9:28                                       ` [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID Thomas Gleixner
2020-10-13 10:15                                         ` David Woodhouse
2020-10-13 10:46                                         ` Thomas Gleixner
2020-10-13 10:53                                           ` David Woodhouse
2020-10-13 11:51                                             ` David Woodhouse
2020-10-13 12:40                                               ` Thomas Gleixner
2020-10-08 11:46   ` [PATCH 1/5] x86/apic: Fix x2apic enablement without interrupt remapping Thomas Gleixner
2020-10-09 10:46 ` [PATCH v2 0/8] Fix x2apic enablement and allow up to 32768 CPUs without IR where supported David Woodhouse
2020-10-09 10:46   ` [PATCH v2 1/8] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
2020-10-09 10:46   ` [PATCH v2 2/8] x86/msi: Only use high bits of MSI address for DMAR unit David Woodhouse
2020-10-09 10:46   ` [PATCH v2 3/8] x86/apic: Always provide irq_compose_msi_msg() method for vector domain David Woodhouse
2020-10-09 10:46   ` [PATCH v2 4/8] x86/ioapic: Handle Extended Destination ID field in RTE David Woodhouse
2020-10-09 10:46   ` [PATCH v2 5/8] x86/apic: Support 15 bits of APIC ID in MSI where available David Woodhouse
2020-10-09 10:46   ` [PATCH v2 6/8] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID David Woodhouse
2020-10-09 10:46   ` [PATCH v2 7/8] x86/hpet: Move MSI support into hpet.c David Woodhouse
2020-10-09 10:46   ` [PATCH v2 8/8] x86/ioapic: Generate RTE directly from parent irqchip's MSI message David Woodhouse
2020-10-22 21:43     ` Thomas Gleixner
2020-10-22 22:10       ` Thomas Gleixner
2020-10-23 17:04         ` David Woodhouse
2020-10-23 10:10       ` David Woodhouse
2020-10-23 21:28         ` Thomas Gleixner
2020-10-24  8:26           ` David Woodhouse
2020-10-24  8:41             ` David Woodhouse
2020-10-24  9:13             ` Paolo Bonzini
2020-10-24 10:13               ` David Woodhouse
2020-10-24 12:44                 ` David Woodhouse
2020-10-24 21:35                   ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse
2020-10-24 21:35                     ` [PATCH v3 01/35] x86/apic: Fix x2apic enablement without interrupt remapping David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
2020-10-24 21:35                     ` [PATCH v3 02/35] x86/msi: Only use high bits of MSI address for DMAR unit David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
2020-10-24 21:35                     ` [PATCH v3 03/35] x86/apic/uv: Fix inconsistent destination mode David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
2020-10-24 21:35                     ` [PATCH v3 04/35] x86/devicetree: Fix the ioapic interrupt type table David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
2020-10-24 21:35                     ` [PATCH v3 05/35] x86/apic: Cleanup delivery mode defines David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
2020-10-24 21:35                     ` [PATCH v3 06/35] x86/apic: Replace pointless apic::dest_logical usage David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] x86/apic: Replace pointless apic:: Dest_logical usage tip-bot2 for Thomas Gleixner
2020-10-24 21:35                     ` [PATCH v3 07/35] x86/apic: Get rid of apic::dest_logical David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] x86/apic: Get rid of apic:: Dest_logical tip-bot2 for Thomas Gleixner
2020-10-24 21:35                     ` [PATCH v3 08/35] x86/apic: Cleanup destination mode David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
2020-10-24 21:35                     ` [PATCH v3 09/35] x86/apic: Always provide irq_compose_msi_msg() method for vector domain David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
2020-10-24 21:35                     ` [PATCH v3 10/35] x86/hpet: Move MSI support into hpet.c David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
2020-10-24 21:35                     ` [PATCH v3 11/35] genirq/msi: Allow shadow declarations of msi_msg::$member David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] genirq/msi: Allow shadow declarations of msi_msg:: $member tip-bot2 for Thomas Gleixner
2020-10-24 21:35                     ` [PATCH v3 12/35] x86/msi: Provide msi message shadow structs David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
2022-04-06  8:36                       ` [PATCH v3 12/35] " Reto Buerki
2022-04-06  8:36                         ` [PATCH] x86/msi: Fix msi message data shadow struct Reto Buerki
2022-04-06 22:11                           ` Thomas Gleixner
2022-04-07 11:06                             ` Reto Buerki
2022-04-07 13:24                               ` [tip: x86/urgent] " tip-bot2 for Reto Buerki
2022-04-06 22:07                         ` [PATCH v3 12/35] x86/msi: Provide msi message shadow structs Thomas Gleixner
2020-10-24 21:35                     ` [PATCH v3 13/35] iommu/intel: Use msi_msg " David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
2020-10-24 21:35                     ` [PATCH v3 14/35] iommu/amd: " David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
2020-10-24 21:35                     ` [PATCH v3 15/35] PCI: vmd: " David Woodhouse
2020-10-28 20:49                       ` Kees Cook
2020-10-28 21:13                         ` Thomas Gleixner
2020-10-28 23:22                           ` Kees Cook
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
2020-10-24 21:35                     ` [PATCH v3 16/35] x86/kvm: " David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
2020-10-24 21:35                     ` [PATCH v3 17/35] x86/pci/xen: " David Woodhouse
2020-10-25  9:49                       ` David Laight
2020-10-25 10:26                         ` David Woodhouse
2020-10-25 13:20                           ` David Laight
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
2020-10-24 21:35                     ` [PATCH v3 18/35] x86/msi: Remove msidef.h David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
2020-10-24 21:35                     ` [PATCH v3 19/35] x86/io_apic: Cleanup trigger/polarity helpers David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
2020-11-09 23:15                         ` Tom Lendacky
2020-11-10  3:30                           ` [EXTERNAL] " David Woodhouse
2020-11-10  6:10                           ` Borislav Petkov
2020-11-10 14:34                             ` Thomas Gleixner
2020-11-10 14:55                               ` Tom Lendacky
2020-11-10 15:54                                 ` Thomas Gleixner
2020-11-10 16:01                                   ` [EXTERNAL] " David Woodhouse
2020-11-10 16:17                                   ` Tom Lendacky
2020-11-10 16:33                                     ` [EXTERNAL] " David Woodhouse
2020-11-10 17:04                                       ` Tom Lendacky
2020-11-10 17:50                                       ` Thomas Gleixner
2020-11-10 18:56                                         ` Thomas Gleixner
2020-11-10 19:21                                           ` David Woodhouse
2020-11-10 21:01                                             ` Thomas Gleixner
2020-11-10 21:30                                               ` David Woodhouse
2020-11-10 22:00                                                 ` Tom Lendacky
2020-11-10 22:48                                                   ` Thomas Gleixner
2020-11-10 23:05                                                     ` Tom Lendacky
2020-11-11  8:16                                                     ` David Woodhouse
2020-11-11  9:46                                                       ` Thomas Gleixner
2020-11-11 10:36                                                         ` David Woodhouse
2020-11-11 12:32                                                           ` David Woodhouse
2020-11-11 20:30                                                             ` Tom Lendacky
2020-11-11 21:19                                                               ` David Woodhouse
2020-11-13 15:14                                                               ` David Woodhouse
2020-11-16 18:02                                                                 ` David Woodhouse
2020-11-17  2:00                                                                 ` Suravee Suthikulpanit
2020-11-18 10:29                                                                   ` Suravee Suthikulpanit
2020-11-18 10:52                                                                     ` David Woodhouse
2020-11-18 14:06                                                                     ` Thomas Gleixner
2020-11-18 16:51                                                                       ` Suravee Suthikulpanit
2020-11-18 17:08                                                                         ` David Woodhouse
2020-11-18 20:16                                                                 ` [tip: x86/apic] iommu/amd: Fix IOMMU interrupt generation in X2APIC mode tip-bot2 for David Woodhouse
2020-11-11 14:43                                                         ` [PATCH 1/3] iommu/amd: Don't register interrupt remapping irqdomain when IR is disabled David Woodhouse
2020-11-11 14:43                                                           ` [PATCH 2/3] iommu/amd: Fix union of bitfields in intcapxt support David Woodhouse
2020-11-11 22:06                                                             ` [tip: x86/apic] " tip-bot2 for David Woodhouse
2020-11-11 14:43                                                           ` [PATCH 3/3] iommu/amd: Fix IOMMU interrupt generation in X2APIC mode David Woodhouse
2020-11-11 22:06                                                           ` [tip: x86/apic] iommu/amd: Don't register interrupt remapping irqdomain when IR is disabled tip-bot2 for David Woodhouse
2020-11-10 17:47                               ` [tip: x86/apic] x86/ioapic: Correct the PCI/ISA trigger type selection tip-bot2 for Thomas Gleixner
2020-11-10  6:31                       ` [PATCH v3 19/35] x86/io_apic: Cleanup trigger/polarity helpers Qian Cai
2020-11-10  8:59                         ` David Woodhouse
2020-11-10 16:26                           ` Paolo Bonzini
2020-10-24 21:35                     ` [PATCH v3 20/35] x86/ioapic: Cleanup IO/APIC route entry structs David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for Thomas Gleixner
2020-10-24 21:35                     ` [PATCH v3 21/35] x86/ioapic: Generate RTE directly from parent irqchip's MSI message David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
2020-10-24 21:35                     ` [PATCH v3 22/35] genirq/irqdomain: Implement get_name() method on irqchip fwnodes David Woodhouse
2020-10-25  9:41                       ` Marc Zyngier
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
2020-10-24 21:35                     ` [PATCH v3 23/35] x86/apic: Add select() method on vector irqdomain David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
2020-10-24 21:35                     ` [PATCH v3 24/35] iommu/amd: Implement select() method on remapping irqdomain David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
2020-10-24 21:35                     ` [PATCH v3 25/35] iommu/vt-d: " David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
2020-10-24 21:35                     ` [PATCH v3 26/35] iommu/hyper-v: " David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
2020-10-24 21:35                     ` [PATCH v3 27/35] x86/hpet: Use irq_find_matching_fwspec() to find " David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
2020-10-24 21:35                     ` [PATCH v3 28/35] x86/ioapic: " David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
2020-10-24 21:35                     ` [PATCH v3 29/35] x86: Kill all traces of irq_remapping_get_irq_domain() David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
2020-10-24 21:35                     ` [PATCH v3 30/35] iommu/vt-d: Simplify intel_irq_remapping_select() David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
2020-10-24 21:35                     ` [PATCH v3 31/35] x86/ioapic: Handle Extended Destination ID field in RTE David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
2020-10-24 21:35                     ` [PATCH v3 32/35] x86/apic: Support 15 bits of APIC ID in MSI where available David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
2020-10-24 21:35                     ` [PATCH v3 33/35] iommu/hyper-v: Disable IRQ pseudo-remapping if 15 bit APIC IDs are available David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
2020-10-24 21:35                     ` [PATCH v3 34/35] x86/kvm: Reserve KVM_FEATURE_MSI_EXT_DEST_ID David Woodhouse
2020-10-24 21:35                     ` [PATCH v3 35/35] x86/kvm: Enable 15-bit extension when KVM_FEATURE_MSI_EXT_DEST_ID detected David Woodhouse
2020-10-29 12:15                       ` [tip: x86/apic] " tip-bot2 for David Woodhouse
2020-10-25  8:12                     ` [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields David Woodhouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).