iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 04/16] iommu/hyperv: don't setup IRQ remapping when running as root
       [not found] <20210120120058.29138-1-wei.liu@kernel.org>
@ 2021-01-20 12:00 ` Wei Liu
  2021-01-20 16:08   ` Pavel Tatashin
  2021-01-26  0:33   ` Michael Kelley via iommu
  2021-01-20 12:00 ` [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition Wei Liu
  1 sibling, 2 replies; 8+ messages in thread
From: Wei Liu @ 2021-01-20 12:00 UTC (permalink / raw)
  To: Linux on Hyper-V List
  Cc: K. Y. Srinivasan, Wei Liu, Joerg Roedel, Stephen Hemminger,
	pasha.tatashin, Will Deacon, Haiyang Zhang, Linux Kernel List,
	Michael Kelley, open list:IOMMU DRIVERS, Nuno Das Neves,
	Sunil Muthuswamy, virtualization, Vineeth Pillai,
	Vitaly Kuznetsov

The IOMMU code needs more work. We're sure for now the IRQ remapping
hooks are not applicable when Linux is the root partition.

Signed-off-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Joerg Roedel <jroedel@suse.de>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 drivers/iommu/hyperv-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 1d21a0b5f724..b7db6024e65c 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -20,6 +20,7 @@
 #include <asm/io_apic.h>
 #include <asm/irq_remapping.h>
 #include <asm/hypervisor.h>
+#include <asm/mshyperv.h>
 
 #include "irq_remapping.h"
 
@@ -122,7 +123,7 @@ static int __init hyperv_prepare_irq_remapping(void)
 
 	if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
 	    x86_init.hyper.msi_ext_dest_id() ||
-	    !x2apic_supported())
+	    !x2apic_supported() || hv_root_partition)
 		return -ENODEV;
 
 	fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
-- 
2.20.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition
       [not found] <20210120120058.29138-1-wei.liu@kernel.org>
  2021-01-20 12:00 ` [PATCH v5 04/16] iommu/hyperv: don't setup IRQ remapping when running as root Wei Liu
@ 2021-01-20 12:00 ` Wei Liu
  2021-01-27  5:47   ` Michael Kelley via iommu
  1 sibling, 1 reply; 8+ messages in thread
From: Wei Liu @ 2021-01-20 12:00 UTC (permalink / raw)
  To: Linux on Hyper-V List
  Cc: Wei Liu, K. Y. Srinivasan, Stephen Hemminger, pasha.tatashin,
	Will Deacon, Haiyang Zhang,
	maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT, Linux Kernel List,
	Michael Kelley, open list:IOMMU DRIVERS, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Nuno Das Neves,
	Sunil Muthuswamy, virtualization, Vineeth Pillai,
	Thomas Gleixner

Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft
Hypervisor when Linux runs as the root partition. Implement an IRQ
domain to handle mapping and unmapping of IO-APIC interrupts.

Signed-off-by: Wei Liu <wei.liu@kernel.org>
---
 arch/x86/hyperv/irqdomain.c     |  54 ++++++++++
 arch/x86/include/asm/mshyperv.h |   4 +
 drivers/iommu/hyperv-iommu.c    | 179 +++++++++++++++++++++++++++++++-
 3 files changed, 233 insertions(+), 4 deletions(-)

diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
index 19637cd60231..8e2b4e478b70 100644
--- a/arch/x86/hyperv/irqdomain.c
+++ b/arch/x86/hyperv/irqdomain.c
@@ -330,3 +330,57 @@ struct irq_domain * __init hv_create_pci_msi_domain(void)
 }
 
 #endif /* CONFIG_PCI_MSI */
+
+int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry)
+{
+	union hv_device_id device_id;
+
+	device_id.as_uint64 = 0;
+	device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
+	device_id.ioapic.ioapic_id = (u8)ioapic_id;
+
+	return hv_unmap_interrupt(device_id.as_uint64, entry) & HV_HYPERCALL_RESULT_MASK;
+}
+EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt);
+
+int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
+		struct hv_interrupt_entry *entry)
+{
+	unsigned long flags;
+	struct hv_input_map_device_interrupt *input;
+	struct hv_output_map_device_interrupt *output;
+	union hv_device_id device_id;
+	struct hv_device_interrupt_descriptor *intr_desc;
+	u16 status;
+
+	device_id.as_uint64 = 0;
+	device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
+	device_id.ioapic.ioapic_id = (u8)ioapic_id;
+
+	local_irq_save(flags);
+	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
+	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
+	memset(input, 0, sizeof(*input));
+	intr_desc = &input->interrupt_descriptor;
+	input->partition_id = hv_current_partition_id;
+	input->device_id = device_id.as_uint64;
+	intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
+	intr_desc->target.vector = vector;
+	intr_desc->vector_count = 1;
+
+	if (level)
+		intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL;
+	else
+		intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
+
+	__set_bit(vcpu, (unsigned long *)&intr_desc->target.vp_mask);
+
+	status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, input, output) &
+			 HV_HYPERCALL_RESULT_MASK;
+	local_irq_restore(flags);
+
+	*entry = output->interrupt_entry;
+
+	return status;
+}
+EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt);
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index ccc849e25d5e..345d7c6f8c37 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -263,6 +263,10 @@ static inline void hv_set_msi_entry_from_desc(union hv_msi_entry *msi_entry,
 
 struct irq_domain *hv_create_pci_msi_domain(void);
 
+int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
+		struct hv_interrupt_entry *entry);
+int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
+
 #else /* CONFIG_HYPERV */
 static inline void hyperv_init(void) {}
 static inline void hyperv_setup_mmu_ops(void) {}
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index b7db6024e65c..6d35e4c303c6 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -116,30 +116,43 @@ static const struct irq_domain_ops hyperv_ir_domain_ops = {
 	.free = hyperv_irq_remapping_free,
 };
 
+static const struct irq_domain_ops hyperv_root_ir_domain_ops;
 static int __init hyperv_prepare_irq_remapping(void)
 {
 	struct fwnode_handle *fn;
 	int i;
+	const char *name;
+	const struct irq_domain_ops *ops;
 
 	if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
 	    x86_init.hyper.msi_ext_dest_id() ||
-	    !x2apic_supported() || hv_root_partition)
+	    !x2apic_supported())
 		return -ENODEV;
 
-	fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
+	if (hv_root_partition) {
+		name = "HYPERV-ROOT-IR";
+		ops = &hyperv_root_ir_domain_ops;
+	} else {
+		name = "HYPERV-IR";
+		ops = &hyperv_ir_domain_ops;
+	}
+
+	fn = irq_domain_alloc_named_id_fwnode(name, 0);
 	if (!fn)
 		return -ENOMEM;
 
 	ioapic_ir_domain =
 		irq_domain_create_hierarchy(arch_get_ir_parent_domain(),
-				0, IOAPIC_REMAPPING_ENTRY, fn,
-				&hyperv_ir_domain_ops, NULL);
+				0, IOAPIC_REMAPPING_ENTRY, fn, ops, NULL);
 
 	if (!ioapic_ir_domain) {
 		irq_domain_free_fwnode(fn);
 		return -ENOMEM;
 	}
 
+	if (hv_root_partition)
+		return 0; /* The rest is only relevant to guests */
+
 	/*
 	 * Hyper-V doesn't provide irq remapping function for
 	 * IO-APIC and so IO-APIC only accepts 8-bit APIC ID.
@@ -167,4 +180,162 @@ struct irq_remap_ops hyperv_irq_remap_ops = {
 	.enable			= hyperv_enable_irq_remapping,
 };
 
+/* IRQ remapping domain when Linux runs as the root partition */
+struct hyperv_root_ir_data {
+	u8 ioapic_id;
+	bool is_level;
+	struct hv_interrupt_entry entry;
+};
+
+static void
+hyperv_root_ir_compose_msi_msg(struct irq_data *irq_data, struct msi_msg *msg)
+{
+	u16 status;
+	u32 vector;
+	struct irq_cfg *cfg;
+	int ioapic_id;
+	struct cpumask *affinity;
+	int cpu, vcpu;
+	struct hv_interrupt_entry entry;
+	struct hyperv_root_ir_data *data = irq_data->chip_data;
+	struct IO_APIC_route_entry e;
+
+	cfg = irqd_cfg(irq_data);
+	affinity = irq_data_get_effective_affinity_mask(irq_data);
+	cpu = cpumask_first_and(affinity, cpu_online_mask);
+	vcpu = hv_cpu_number_to_vp_number(cpu);
+
+	vector = cfg->vector;
+	ioapic_id = data->ioapic_id;
+
+	if (data->entry.source == HV_DEVICE_TYPE_IOAPIC
+	    && data->entry.ioapic_rte.as_uint64) {
+		entry = data->entry;
+
+		status = hv_unmap_ioapic_interrupt(ioapic_id, &entry);
+
+		if (status != HV_STATUS_SUCCESS)
+			pr_debug("%s: unexpected unmap status %d\n", __func__, status);
+
+		data->entry.ioapic_rte.as_uint64 = 0;
+		data->entry.source = 0; /* Invalid source */
+	}
+
+
+	status = hv_map_ioapic_interrupt(ioapic_id, data->is_level, vcpu,
+					vector, &entry);
+
+	if (status != HV_STATUS_SUCCESS) {
+		pr_err("%s: map hypercall failed, status %d\n", __func__, status);
+		return;
+	}
+
+	data->entry = entry;
+
+	/* Turn it into an IO_APIC_route_entry, and generate MSI MSG. */
+	e.w1 = entry.ioapic_rte.low_uint32;
+	e.w2 = entry.ioapic_rte.high_uint32;
+
+	memset(msg, 0, sizeof(*msg));
+	msg->arch_data.vector = e.vector;
+	msg->arch_data.delivery_mode = e.delivery_mode;
+	msg->arch_addr_lo.dest_mode_logical = e.dest_mode_logical;
+	msg->arch_addr_lo.dmar_format = e.ir_format;
+	msg->arch_addr_lo.dmar_index_0_14 = e.ir_index_0_14;
+}
+
+static int hyperv_root_ir_set_affinity(struct irq_data *data,
+		const struct cpumask *mask, bool force)
+{
+	struct irq_data *parent = data->parent_data;
+	struct irq_cfg *cfg = irqd_cfg(data);
+	int ret;
+
+	ret = parent->chip->irq_set_affinity(parent, mask, force);
+	if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
+		return ret;
+
+	send_cleanup_vector(cfg);
+
+	return 0;
+}
+
+static struct irq_chip hyperv_root_ir_chip = {
+	.name			= "HYPERV-ROOT-IR",
+	.irq_ack		= apic_ack_irq,
+	.irq_set_affinity	= hyperv_root_ir_set_affinity,
+	.irq_compose_msi_msg	= hyperv_root_ir_compose_msi_msg,
+};
+
+static int hyperv_root_irq_remapping_alloc(struct irq_domain *domain,
+				     unsigned int virq, unsigned int nr_irqs,
+				     void *arg)
+{
+	struct irq_alloc_info *info = arg;
+	struct irq_data *irq_data;
+	struct hyperv_root_ir_data *data;
+	int ret = 0;
+
+	if (!info || info->type != X86_IRQ_ALLOC_TYPE_IOAPIC || nr_irqs > 1)
+		return -EINVAL;
+
+	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
+	if (ret < 0)
+		return ret;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data) {
+		irq_domain_free_irqs_common(domain, virq, nr_irqs);
+		return -ENOMEM;
+	}
+
+	irq_data = irq_domain_get_irq_data(domain, virq);
+	if (!irq_data) {
+		kfree(data);
+		irq_domain_free_irqs_common(domain, virq, nr_irqs);
+		return -EINVAL;
+	}
+
+	data->ioapic_id = info->devid;
+	data->is_level = info->ioapic.is_level;
+
+	irq_data->chip = &hyperv_root_ir_chip;
+	irq_data->chip_data = data;
+
+	return 0;
+}
+
+static void hyperv_root_irq_remapping_free(struct irq_domain *domain,
+				 unsigned int virq, unsigned int nr_irqs)
+{
+	struct irq_data *irq_data;
+	struct hyperv_root_ir_data *data;
+	struct hv_interrupt_entry *e;
+	int i;
+
+	for (i = 0; i < nr_irqs; i++) {
+		irq_data = irq_domain_get_irq_data(domain, virq + i);
+
+		if (irq_data && irq_data->chip_data) {
+			data = irq_data->chip_data;
+			e = &data->entry;
+
+			if (e->source == HV_DEVICE_TYPE_IOAPIC
+			      && e->ioapic_rte.as_uint64)
+				hv_unmap_ioapic_interrupt(data->ioapic_id,
+							&data->entry);
+
+			kfree(data);
+		}
+	}
+
+	irq_domain_free_irqs_common(domain, virq, nr_irqs);
+}
+
+static const struct irq_domain_ops hyperv_root_ir_domain_ops = {
+	.select = hyperv_irq_remapping_select,
+	.alloc = hyperv_root_irq_remapping_alloc,
+	.free = hyperv_root_irq_remapping_free,
+};
+
 #endif
-- 
2.20.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 04/16] iommu/hyperv: don't setup IRQ remapping when running as root
  2021-01-20 12:00 ` [PATCH v5 04/16] iommu/hyperv: don't setup IRQ remapping when running as root Wei Liu
@ 2021-01-20 16:08   ` Pavel Tatashin
  2021-01-26  0:33   ` Michael Kelley via iommu
  1 sibling, 0 replies; 8+ messages in thread
From: Pavel Tatashin @ 2021-01-20 16:08 UTC (permalink / raw)
  To: Wei Liu
  Cc: K. Y. Srinivasan, Linux on Hyper-V List, Joerg Roedel,
	Stephen Hemminger, Will Deacon, Haiyang Zhang, Linux Kernel List,
	Michael Kelley, open list:IOMMU DRIVERS, Nuno Das Neves,
	Sunil Muthuswamy, virtualization, Vineeth Pillai,
	Vitaly Kuznetsov

On Wed, Jan 20, 2021 at 7:01 AM Wei Liu <wei.liu@kernel.org> wrote:
>
> The IOMMU code needs more work. We're sure for now the IRQ remapping
> hooks are not applicable when Linux is the root partition.
>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> Acked-by: Joerg Roedel <jroedel@suse.de>
> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  drivers/iommu/hyperv-iommu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> index 1d21a0b5f724..b7db6024e65c 100644
> --- a/drivers/iommu/hyperv-iommu.c
> +++ b/drivers/iommu/hyperv-iommu.c
> @@ -20,6 +20,7 @@
>  #include <asm/io_apic.h>
>  #include <asm/irq_remapping.h>
>  #include <asm/hypervisor.h>
> +#include <asm/mshyperv.h>
>
>  #include "irq_remapping.h"
>
> @@ -122,7 +123,7 @@ static int __init hyperv_prepare_irq_remapping(void)
>
>         if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
>             x86_init.hyper.msi_ext_dest_id() ||
> -           !x2apic_supported())
> +           !x2apic_supported() || hv_root_partition)
>                 return -ENODEV;

Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH v5 04/16] iommu/hyperv: don't setup IRQ remapping when running as root
  2021-01-20 12:00 ` [PATCH v5 04/16] iommu/hyperv: don't setup IRQ remapping when running as root Wei Liu
  2021-01-20 16:08   ` Pavel Tatashin
@ 2021-01-26  0:33   ` Michael Kelley via iommu
  1 sibling, 0 replies; 8+ messages in thread
From: Michael Kelley via iommu @ 2021-01-26  0:33 UTC (permalink / raw)
  To: Wei Liu, Linux on Hyper-V List
  Cc: Joerg Roedel, Stephen Hemminger, pasha.tatashin, Will Deacon,
	Haiyang Zhang, Linux Kernel List, virtualization,
	open list:IOMMU DRIVERS, Nuno Das Neves, Sunil Muthuswamy,
	KY Srinivasan, Vineeth Pillai, vkuznets

From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> 
> The IOMMU code needs more work. We're sure for now the IRQ remapping
> hooks are not applicable when Linux is the root partition.
> 
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> Acked-by: Joerg Roedel <jroedel@suse.de>
> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  drivers/iommu/hyperv-iommu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> index 1d21a0b5f724..b7db6024e65c 100644
> --- a/drivers/iommu/hyperv-iommu.c
> +++ b/drivers/iommu/hyperv-iommu.c
> @@ -20,6 +20,7 @@
>  #include <asm/io_apic.h>
>  #include <asm/irq_remapping.h>
>  #include <asm/hypervisor.h>
> +#include <asm/mshyperv.h>
> 
>  #include "irq_remapping.h"
> 
> @@ -122,7 +123,7 @@ static int __init hyperv_prepare_irq_remapping(void)
> 
>  	if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
>  	    x86_init.hyper.msi_ext_dest_id() ||
> -	    !x2apic_supported())
> +	    !x2apic_supported() || hv_root_partition)
>  		return -ENODEV;
> 
>  	fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
> --
> 2.20.1

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition
  2021-01-20 12:00 ` [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition Wei Liu
@ 2021-01-27  5:47   ` Michael Kelley via iommu
  2021-02-03 12:47     ` Wei Liu
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Kelley via iommu @ 2021-01-27  5:47 UTC (permalink / raw)
  To: Wei Liu, Linux on Hyper-V List
  Cc: Stephen Hemminger, pasha.tatashin, Will Deacon, Haiyang Zhang,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Linux Kernel List, virtualization, open list:IOMMU DRIVERS,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Nuno Das Neves,
	Sunil Muthuswamy, KY Srinivasan, Vineeth Pillai, Thomas Gleixner

From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> 
> Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft
> Hypervisor when Linux runs as the root partition. Implement an IRQ
> domain to handle mapping and unmapping of IO-APIC interrupts.
> 
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
>  arch/x86/hyperv/irqdomain.c     |  54 ++++++++++
>  arch/x86/include/asm/mshyperv.h |   4 +
>  drivers/iommu/hyperv-iommu.c    | 179 +++++++++++++++++++++++++++++++-
>  3 files changed, 233 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> index 19637cd60231..8e2b4e478b70 100644
> --- a/arch/x86/hyperv/irqdomain.c
> +++ b/arch/x86/hyperv/irqdomain.c
> @@ -330,3 +330,57 @@ struct irq_domain * __init hv_create_pci_msi_domain(void)
>  }
> 
>  #endif /* CONFIG_PCI_MSI */
> +
> +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry)
> +{
> +	union hv_device_id device_id;
> +
> +	device_id.as_uint64 = 0;
> +	device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> +	device_id.ioapic.ioapic_id = (u8)ioapic_id;
> +
> +	return hv_unmap_interrupt(device_id.as_uint64, entry) & HV_HYPERCALL_RESULT_MASK;

The masking is already done in hv_unmap_interrupt.

> +}
> +EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt);
> +
> +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
> +		struct hv_interrupt_entry *entry)
> +{
> +	unsigned long flags;
> +	struct hv_input_map_device_interrupt *input;
> +	struct hv_output_map_device_interrupt *output;
> +	union hv_device_id device_id;
> +	struct hv_device_interrupt_descriptor *intr_desc;
> +	u16 status;
> +
> +	device_id.as_uint64 = 0;
> +	device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> +	device_id.ioapic.ioapic_id = (u8)ioapic_id;
> +
> +	local_irq_save(flags);
> +	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> +	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> +	memset(input, 0, sizeof(*input));
> +	intr_desc = &input->interrupt_descriptor;
> +	input->partition_id = hv_current_partition_id;
> +	input->device_id = device_id.as_uint64;
> +	intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
> +	intr_desc->target.vector = vector;
> +	intr_desc->vector_count = 1;
> +
> +	if (level)
> +		intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL;
> +	else
> +		intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> +
> +	__set_bit(vcpu, (unsigned long *)&intr_desc->target.vp_mask);
> +
> +	status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, input, output) &
> +			 HV_HYPERCALL_RESULT_MASK;
> +	local_irq_restore(flags);
> +
> +	*entry = output->interrupt_entry;
> +
> +	return status;

As a cross-check, I was comparing this code against hv_map_msi_interrupt().  They are
mostly parallel, though some of the assignments are done in a different order.  It's a nit,
but making them as parallel as possible would be nice. :-)

Same 64 vCPU comment applies here as well.


> +}
> +EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt);
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ccc849e25d5e..345d7c6f8c37 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -263,6 +263,10 @@ static inline void hv_set_msi_entry_from_desc(union
> hv_msi_entry *msi_entry,
> 
>  struct irq_domain *hv_create_pci_msi_domain(void);
> 
> +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
> +		struct hv_interrupt_entry *entry);
> +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
> +
>  #else /* CONFIG_HYPERV */
>  static inline void hyperv_init(void) {}
>  static inline void hyperv_setup_mmu_ops(void) {}
> diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> index b7db6024e65c..6d35e4c303c6 100644
> --- a/drivers/iommu/hyperv-iommu.c
> +++ b/drivers/iommu/hyperv-iommu.c
> @@ -116,30 +116,43 @@ static const struct irq_domain_ops hyperv_ir_domain_ops = {
>  	.free = hyperv_irq_remapping_free,
>  };
> 
> +static const struct irq_domain_ops hyperv_root_ir_domain_ops;
>  static int __init hyperv_prepare_irq_remapping(void)
>  {
>  	struct fwnode_handle *fn;
>  	int i;
> +	const char *name;
> +	const struct irq_domain_ops *ops;
> 
>  	if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
>  	    x86_init.hyper.msi_ext_dest_id() ||
> -	    !x2apic_supported() || hv_root_partition)
> +	    !x2apic_supported())

Any reason that the check for hv_root_partition was added
in patch #4  of this series, and then removed here?  Could
patch #4 just be dropped?

>  		return -ENODEV;
> 
> -	fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
> +	if (hv_root_partition) {
> +		name = "HYPERV-ROOT-IR";
> +		ops = &hyperv_root_ir_domain_ops;
> +	} else {
> +		name = "HYPERV-IR";
> +		ops = &hyperv_ir_domain_ops;
> +	}
> +
> +	fn = irq_domain_alloc_named_id_fwnode(name, 0);
>  	if (!fn)
>  		return -ENOMEM;
> 
>  	ioapic_ir_domain =
>  		irq_domain_create_hierarchy(arch_get_ir_parent_domain(),
> -				0, IOAPIC_REMAPPING_ENTRY, fn,
> -				&hyperv_ir_domain_ops, NULL);
> +				0, IOAPIC_REMAPPING_ENTRY, fn, ops, NULL);
> 
>  	if (!ioapic_ir_domain) {
>  		irq_domain_free_fwnode(fn);
>  		return -ENOMEM;
>  	}
> 
> +	if (hv_root_partition)
> +		return 0; /* The rest is only relevant to guests */
> +
>  	/*
>  	 * Hyper-V doesn't provide irq remapping function for
>  	 * IO-APIC and so IO-APIC only accepts 8-bit APIC ID.
> @@ -167,4 +180,162 @@ struct irq_remap_ops hyperv_irq_remap_ops = {
>  	.enable			= hyperv_enable_irq_remapping,
>  };
> 
> +/* IRQ remapping domain when Linux runs as the root partition */
> +struct hyperv_root_ir_data {
> +	u8 ioapic_id;
> +	bool is_level;
> +	struct hv_interrupt_entry entry;
> +};
> +
> +static void
> +hyperv_root_ir_compose_msi_msg(struct irq_data *irq_data, struct msi_msg *msg)
> +{
> +	u16 status;
> +	u32 vector;
> +	struct irq_cfg *cfg;
> +	int ioapic_id;
> +	struct cpumask *affinity;
> +	int cpu, vcpu;
> +	struct hv_interrupt_entry entry;
> +	struct hyperv_root_ir_data *data = irq_data->chip_data;
> +	struct IO_APIC_route_entry e;
> +
> +	cfg = irqd_cfg(irq_data);
> +	affinity = irq_data_get_effective_affinity_mask(irq_data);
> +	cpu = cpumask_first_and(affinity, cpu_online_mask);
> +	vcpu = hv_cpu_number_to_vp_number(cpu);
> +
> +	vector = cfg->vector;
> +	ioapic_id = data->ioapic_id;
> +
> +	if (data->entry.source == HV_DEVICE_TYPE_IOAPIC

Does 'data' need to be checked to be non-NULL?  The parallel code in
hv_irq_compose_msi_msg() makes such a check.

> +	    && data->entry.ioapic_rte.as_uint64) {
> +		entry = data->entry;
> +
> +		status = hv_unmap_ioapic_interrupt(ioapic_id, &entry);
> +
> +		if (status != HV_STATUS_SUCCESS)
> +			pr_debug("%s: unexpected unmap status %d\n", __func__, status);
> +
> +		data->entry.ioapic_rte.as_uint64 = 0;
> +		data->entry.source = 0; /* Invalid source */

Again comparing, hv_irq_compose_msi_msg() frees the old
entry, and then allocates a new one.   This code reuses the old entry. 
Any reason for the difference?

> +	}
> +
> +
> +	status = hv_map_ioapic_interrupt(ioapic_id, data->is_level, vcpu,
> +					vector, &entry);
> +
> +	if (status != HV_STATUS_SUCCESS) {
> +		pr_err("%s: map hypercall failed, status %d\n", __func__, status);
> +		return;
> +	}
> +
> +	data->entry = entry;
> +
> +	/* Turn it into an IO_APIC_route_entry, and generate MSI MSG. */
> +	e.w1 = entry.ioapic_rte.low_uint32;
> +	e.w2 = entry.ioapic_rte.high_uint32;
> +
> +	memset(msg, 0, sizeof(*msg));
> +	msg->arch_data.vector = e.vector;
> +	msg->arch_data.delivery_mode = e.delivery_mode;
> +	msg->arch_addr_lo.dest_mode_logical = e.dest_mode_logical;
> +	msg->arch_addr_lo.dmar_format = e.ir_format;
> +	msg->arch_addr_lo.dmar_index_0_14 = e.ir_index_0_14;
> +}

Having this whole function be more parallel to hv_irq_compose_msi_msg()
would be nice. :-)

> +
> +static int hyperv_root_ir_set_affinity(struct irq_data *data,
> +		const struct cpumask *mask, bool force)
> +{
> +	struct irq_data *parent = data->parent_data;
> +	struct irq_cfg *cfg = irqd_cfg(data);
> +	int ret;
> +
> +	ret = parent->chip->irq_set_affinity(parent, mask, force);
> +	if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
> +		return ret;
> +
> +	send_cleanup_vector(cfg);
> +
> +	return 0;
> +}
> +
> +static struct irq_chip hyperv_root_ir_chip = {
> +	.name			= "HYPERV-ROOT-IR",
> +	.irq_ack		= apic_ack_irq,
> +	.irq_set_affinity	= hyperv_root_ir_set_affinity,
> +	.irq_compose_msi_msg	= hyperv_root_ir_compose_msi_msg,
> +};
> +
> +static int hyperv_root_irq_remapping_alloc(struct irq_domain *domain,
> +				     unsigned int virq, unsigned int nr_irqs,
> +				     void *arg)
> +{
> +	struct irq_alloc_info *info = arg;
> +	struct irq_data *irq_data;
> +	struct hyperv_root_ir_data *data;
> +	int ret = 0;
> +
> +	if (!info || info->type != X86_IRQ_ALLOC_TYPE_IOAPIC || nr_irqs > 1)
> +		return -EINVAL;
> +
> +	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
> +	if (ret < 0)
> +		return ret;
> +
> +	data = kzalloc(sizeof(*data), GFP_KERNEL);
> +	if (!data) {
> +		irq_domain_free_irqs_common(domain, virq, nr_irqs);
> +		return -ENOMEM;
> +	}
> +
> +	irq_data = irq_domain_get_irq_data(domain, virq);
> +	if (!irq_data) {
> +		kfree(data);
> +		irq_domain_free_irqs_common(domain, virq, nr_irqs);
> +		return -EINVAL;
> +	}
> +
> +	data->ioapic_id = info->devid;
> +	data->is_level = info->ioapic.is_level;
> +
> +	irq_data->chip = &hyperv_root_ir_chip;
> +	irq_data->chip_data = data;
> +
> +	return 0;
> +}
> +
> +static void hyperv_root_irq_remapping_free(struct irq_domain *domain,
> +				 unsigned int virq, unsigned int nr_irqs)
> +{
> +	struct irq_data *irq_data;
> +	struct hyperv_root_ir_data *data;
> +	struct hv_interrupt_entry *e;
> +	int i;
> +
> +	for (i = 0; i < nr_irqs; i++) {
> +		irq_data = irq_domain_get_irq_data(domain, virq + i);
> +
> +		if (irq_data && irq_data->chip_data) {
> +			data = irq_data->chip_data;

Set irq_data->chip_data to NULL?  That seems to be done in other
similar places in your code.

> +			e = &data->entry;
> +
> +			if (e->source == HV_DEVICE_TYPE_IOAPIC
> +			      && e->ioapic_rte.as_uint64)
> +				hv_unmap_ioapic_interrupt(data->ioapic_id,
> +							&data->entry);
> +
> +			kfree(data);
> +		}
> +	}
> +
> +	irq_domain_free_irqs_common(domain, virq, nr_irqs);
> +}
> +
> +static const struct irq_domain_ops hyperv_root_ir_domain_ops = {
> +	.select = hyperv_irq_remapping_select,
> +	.alloc = hyperv_root_irq_remapping_alloc,
> +	.free = hyperv_root_irq_remapping_free,
> +};
> +
>  #endif
> --
> 2.20.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition
  2021-01-27  5:47   ` Michael Kelley via iommu
@ 2021-02-03 12:47     ` Wei Liu
  2021-02-04 16:41       ` Michael Kelley via iommu
  0 siblings, 1 reply; 8+ messages in thread
From: Wei Liu @ 2021-02-03 12:47 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Wei Liu, Stephen Hemminger, pasha.tatashin,
	Linux on Hyper-V List, Will Deacon, Haiyang Zhang,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Linux Kernel List, virtualization, open list:IOMMU DRIVERS,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Nuno Das Neves,
	Sunil Muthuswamy, KY Srinivasan, Vineeth Pillai, Thomas Gleixner

On Wed, Jan 27, 2021 at 05:47:08AM +0000, Michael Kelley wrote:
> From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> > 
> > Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft
> > Hypervisor when Linux runs as the root partition. Implement an IRQ
> > domain to handle mapping and unmapping of IO-APIC interrupts.
> > 
> > Signed-off-by: Wei Liu <wei.liu@kernel.org>
> > ---
> >  arch/x86/hyperv/irqdomain.c     |  54 ++++++++++
> >  arch/x86/include/asm/mshyperv.h |   4 +
> >  drivers/iommu/hyperv-iommu.c    | 179 +++++++++++++++++++++++++++++++-
> >  3 files changed, 233 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> > index 19637cd60231..8e2b4e478b70 100644
> > --- a/arch/x86/hyperv/irqdomain.c
> > +++ b/arch/x86/hyperv/irqdomain.c
> > @@ -330,3 +330,57 @@ struct irq_domain * __init hv_create_pci_msi_domain(void)
> >  }
> > 
> >  #endif /* CONFIG_PCI_MSI */
> > +
> > +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry)
> > +{
> > +	union hv_device_id device_id;
> > +
> > +	device_id.as_uint64 = 0;
> > +	device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> > +	device_id.ioapic.ioapic_id = (u8)ioapic_id;
> > +
> > +	return hv_unmap_interrupt(device_id.as_uint64, entry) & HV_HYPERCALL_RESULT_MASK;
> 
> The masking is already done in hv_unmap_interrupt.

Fixed.

> 
> > +}
> > +EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt);
> > +
> > +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
> > +		struct hv_interrupt_entry *entry)
> > +{
> > +	unsigned long flags;
> > +	struct hv_input_map_device_interrupt *input;
> > +	struct hv_output_map_device_interrupt *output;
> > +	union hv_device_id device_id;
> > +	struct hv_device_interrupt_descriptor *intr_desc;
> > +	u16 status;
> > +
> > +	device_id.as_uint64 = 0;
> > +	device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> > +	device_id.ioapic.ioapic_id = (u8)ioapic_id;
> > +
> > +	local_irq_save(flags);
> > +	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > +	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> > +	memset(input, 0, sizeof(*input));
> > +	intr_desc = &input->interrupt_descriptor;
> > +	input->partition_id = hv_current_partition_id;
> > +	input->device_id = device_id.as_uint64;
> > +	intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
> > +	intr_desc->target.vector = vector;
> > +	intr_desc->vector_count = 1;
> > +
> > +	if (level)
> > +		intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL;
> > +	else
> > +		intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> > +
> > +	__set_bit(vcpu, (unsigned long *)&intr_desc->target.vp_mask);
> > +
> > +	status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, input, output) &
> > +			 HV_HYPERCALL_RESULT_MASK;
> > +	local_irq_restore(flags);
> > +
> > +	*entry = output->interrupt_entry;
> > +
> > +	return status;
> 
> As a cross-check, I was comparing this code against hv_map_msi_interrupt().  They are
> mostly parallel, though some of the assignments are done in a different order.  It's a nit,
> but making them as parallel as possible would be nice. :-)
> 

Indeed. I will see about factoring out a function.

> Same 64 vCPU comment applies here as well.
> 

This is changed to use vpset instead. Took me a bit of time to get it
working because document is a bit lacking.

> 
> > +}
> > +EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt);
> > diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> > index ccc849e25d5e..345d7c6f8c37 100644
> > --- a/arch/x86/include/asm/mshyperv.h
> > +++ b/arch/x86/include/asm/mshyperv.h
> > @@ -263,6 +263,10 @@ static inline void hv_set_msi_entry_from_desc(union
> > hv_msi_entry *msi_entry,
> > 
> >  struct irq_domain *hv_create_pci_msi_domain(void);
> > 
> > +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
> > +		struct hv_interrupt_entry *entry);
> > +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
> > +
> >  #else /* CONFIG_HYPERV */
> >  static inline void hyperv_init(void) {}
> >  static inline void hyperv_setup_mmu_ops(void) {}
> > diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> > index b7db6024e65c..6d35e4c303c6 100644
> > --- a/drivers/iommu/hyperv-iommu.c
> > +++ b/drivers/iommu/hyperv-iommu.c
> > @@ -116,30 +116,43 @@ static const struct irq_domain_ops hyperv_ir_domain_ops = {
> >  	.free = hyperv_irq_remapping_free,
> >  };
> > 
> > +static const struct irq_domain_ops hyperv_root_ir_domain_ops;
> >  static int __init hyperv_prepare_irq_remapping(void)
> >  {
> >  	struct fwnode_handle *fn;
> >  	int i;
> > +	const char *name;
> > +	const struct irq_domain_ops *ops;
> > 
> >  	if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
> >  	    x86_init.hyper.msi_ext_dest_id() ||
> > -	    !x2apic_supported() || hv_root_partition)
> > +	    !x2apic_supported())
> 
> Any reason that the check for hv_root_partition was added
> in patch #4  of this series, and then removed here?  Could
> patch #4 just be dropped?
> 

Before v5 (or v4?) IO-APIC was not handled via Hyper-V IOMMU. Now it is.

Patch 4 has become redundant with that change. I already dropped patch 4
in the v6 branch I have locally.

> >  		return -ENODEV;
> > 
> > -	fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
> > +	if (hv_root_partition) {
> > +		name = "HYPERV-ROOT-IR";
> > +		ops = &hyperv_root_ir_domain_ops;
> > +	} else {
> > +		name = "HYPERV-IR";
> > +		ops = &hyperv_ir_domain_ops;
> > +	}
> > +
[...]
> > +static void
> > +hyperv_root_ir_compose_msi_msg(struct irq_data *irq_data, struct msi_msg *msg)
> > +{
> > +	u16 status;
> > +	u32 vector;
> > +	struct irq_cfg *cfg;
> > +	int ioapic_id;
> > +	struct cpumask *affinity;
> > +	int cpu, vcpu;
> > +	struct hv_interrupt_entry entry;
> > +	struct hyperv_root_ir_data *data = irq_data->chip_data;
> > +	struct IO_APIC_route_entry e;
> > +
> > +	cfg = irqd_cfg(irq_data);
> > +	affinity = irq_data_get_effective_affinity_mask(irq_data);
> > +	cpu = cpumask_first_and(affinity, cpu_online_mask);
> > +	vcpu = hv_cpu_number_to_vp_number(cpu);
> > +
> > +	vector = cfg->vector;
> > +	ioapic_id = data->ioapic_id;
> > +
> > +	if (data->entry.source == HV_DEVICE_TYPE_IOAPIC
> 
> Does 'data' need to be checked to be non-NULL?  The parallel code in
> hv_irq_compose_msi_msg() makes such a check.

The usage of irq_data->chip_data is different in these two functions.

In this function, we're sure it is correctly allocated by
hyperv_root_ir_remapping_alloc at some point before.

In hv_irq_compose_msi_msg, irq_data->chip_data is instead used as a
temporary place to stash some state that is controlled solely by the
said function.

Once we get to the point of introducing a paravirtualized IOMMU for the
root partition, we can then unify these two paths.

> 
> > +	    && data->entry.ioapic_rte.as_uint64) {
> > +		entry = data->entry;
> > +
> > +		status = hv_unmap_ioapic_interrupt(ioapic_id, &entry);
> > +
> > +		if (status != HV_STATUS_SUCCESS)
> > +			pr_debug("%s: unexpected unmap status %d\n", __func__, status);
> > +
> > +		data->entry.ioapic_rte.as_uint64 = 0;
> > +		data->entry.source = 0; /* Invalid source */
> 
> Again comparing, hv_irq_compose_msi_msg() frees the old
> entry, and then allocates a new one.   This code reuses the old entry. 
> Any reason for the difference?
> 

See above.

I can perhaps tweak the logic a bit to reuse the same entry, but the
overall design won't change. I opted to always reallocate because that
looked more straight-forward to me.

Let me know if you feel strongly about reusing.

> > +	}
> > +
> > +
> > +	status = hv_map_ioapic_interrupt(ioapic_id, data->is_level, vcpu,
> > +					vector, &entry);
> > +
> > +	if (status != HV_STATUS_SUCCESS) {
> > +		pr_err("%s: map hypercall failed, status %d\n", __func__, status);
> > +		return;
> > +	}
> > +
> > +	data->entry = entry;
> > +
> > +	/* Turn it into an IO_APIC_route_entry, and generate MSI MSG. */
> > +	e.w1 = entry.ioapic_rte.low_uint32;
> > +	e.w2 = entry.ioapic_rte.high_uint32;
> > +
> > +	memset(msg, 0, sizeof(*msg));
> > +	msg->arch_data.vector = e.vector;
> > +	msg->arch_data.delivery_mode = e.delivery_mode;
> > +	msg->arch_addr_lo.dest_mode_logical = e.dest_mode_logical;
> > +	msg->arch_addr_lo.dmar_format = e.ir_format;
> > +	msg->arch_addr_lo.dmar_index_0_14 = e.ir_index_0_14;
> > +}
> 
> Having this whole function be more parallel to hv_irq_compose_msi_msg()
> would be nice. :-)
> 

Unlike hv_map_ioapic_interrupt and hv_map_msi_interrupt, which can
benefit from unifying now, this and hv_irq_compose_msi_msg will need to
wait till we have an IOMMU for the reason I stated above.

> > +
> > +static int hyperv_root_ir_set_affinity(struct irq_data *data,
> > +		const struct cpumask *mask, bool force)
> > +{
> > +	struct irq_data *parent = data->parent_data;
> > +	struct irq_cfg *cfg = irqd_cfg(data);
> > +	int ret;
> > +
> > +	ret = parent->chip->irq_set_affinity(parent, mask, force);
> > +	if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
> > +		return ret;
> > +
> > +	send_cleanup_vector(cfg);
> > +
> > +	return 0;
> > +}
> > +
[...]
> > +
> > +static void hyperv_root_irq_remapping_free(struct irq_domain *domain,
> > +				 unsigned int virq, unsigned int nr_irqs)
> > +{
> > +	struct irq_data *irq_data;
> > +	struct hyperv_root_ir_data *data;
> > +	struct hv_interrupt_entry *e;
> > +	int i;
> > +
> > +	for (i = 0; i < nr_irqs; i++) {
> > +		irq_data = irq_domain_get_irq_data(domain, virq + i);
> > +
> > +		if (irq_data && irq_data->chip_data) {
> > +			data = irq_data->chip_data;
> 
> Set irq_data->chip_data to NULL?  That seems to be done in other
> similar places in your code.

There is no need to do that. By the time this function returns, irq_data
will be gone too -- freed by irq_domain_free_irqs_common.

> 
> > +			e = &data->entry;
> > +
> > +			if (e->source == HV_DEVICE_TYPE_IOAPIC
> > +			      && e->ioapic_rte.as_uint64)
> > +				hv_unmap_ioapic_interrupt(data->ioapic_id,
> > +							&data->entry);
> > +
> > +			kfree(data);
> > +		}
> > +	}
> > +
> > +	irq_domain_free_irqs_common(domain, virq, nr_irqs);
> > +}
> > +
> > +static const struct irq_domain_ops hyperv_root_ir_domain_ops = {
> > +	.select = hyperv_irq_remapping_select,
> > +	.alloc = hyperv_root_irq_remapping_alloc,
> > +	.free = hyperv_root_irq_remapping_free,
> > +};
> > +
> >  #endif
> > --
> > 2.20.1
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition
  2021-02-03 12:47     ` Wei Liu
@ 2021-02-04 16:41       ` Michael Kelley via iommu
  2021-02-04 16:48         ` Wei Liu
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Kelley via iommu @ 2021-02-04 16:41 UTC (permalink / raw)
  To: Wei Liu
  Cc: Linux on Hyper-V List, Stephen Hemminger, pasha.tatashin,
	Will Deacon, Haiyang Zhang,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Linux Kernel List, virtualization, open list:IOMMU DRIVERS,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Nuno Das Neves,
	Sunil Muthuswamy, KY Srinivasan, Vineeth Pillai, Thomas Gleixner

From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, February 3, 2021 4:47 AM
> 
> On Wed, Jan 27, 2021 at 05:47:08AM +0000, Michael Kelley wrote:
> > From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> > >
> > > Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft
> > > Hypervisor when Linux runs as the root partition. Implement an IRQ
> > > domain to handle mapping and unmapping of IO-APIC interrupts.
> > >
> > > Signed-off-by: Wei Liu <wei.liu@kernel.org>
> > > ---
> > >  arch/x86/hyperv/irqdomain.c     |  54 ++++++++++
> > >  arch/x86/include/asm/mshyperv.h |   4 +
> > >  drivers/iommu/hyperv-iommu.c    | 179 +++++++++++++++++++++++++++++++-
> > >  3 files changed, 233 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> > > index 19637cd60231..8e2b4e478b70 100644
> > > --- a/arch/x86/hyperv/irqdomain.c
> > > +++ b/arch/x86/hyperv/irqdomain.c
> > > @@ -330,3 +330,57 @@ struct irq_domain * __init hv_create_pci_msi_domain(void)
> > >  }
> > >
> > >  #endif /* CONFIG_PCI_MSI */
> > > +
> > > +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry)
> > > +{
> > > +	union hv_device_id device_id;
> > > +
> > > +	device_id.as_uint64 = 0;
> > > +	device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> > > +	device_id.ioapic.ioapic_id = (u8)ioapic_id;
> > > +
> > > +	return hv_unmap_interrupt(device_id.as_uint64, entry) &
> HV_HYPERCALL_RESULT_MASK;
> >
> > The masking is already done in hv_unmap_interrupt.
> 
> Fixed.
> 
> >
> > > +}
> > > +EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt);
> > > +
> > > +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
> > > +		struct hv_interrupt_entry *entry)
> > > +{
> > > +	unsigned long flags;
> > > +	struct hv_input_map_device_interrupt *input;
> > > +	struct hv_output_map_device_interrupt *output;
> > > +	union hv_device_id device_id;
> > > +	struct hv_device_interrupt_descriptor *intr_desc;
> > > +	u16 status;
> > > +
> > > +	device_id.as_uint64 = 0;
> > > +	device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> > > +	device_id.ioapic.ioapic_id = (u8)ioapic_id;
> > > +
> > > +	local_irq_save(flags);
> > > +	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > > +	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> > > +	memset(input, 0, sizeof(*input));
> > > +	intr_desc = &input->interrupt_descriptor;
> > > +	input->partition_id = hv_current_partition_id;
> > > +	input->device_id = device_id.as_uint64;
> > > +	intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
> > > +	intr_desc->target.vector = vector;
> > > +	intr_desc->vector_count = 1;
> > > +
> > > +	if (level)
> > > +		intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL;
> > > +	else
> > > +		intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> > > +
> > > +	__set_bit(vcpu, (unsigned long *)&intr_desc->target.vp_mask);
> > > +
> > > +	status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, input,
> output) &
> > > +			 HV_HYPERCALL_RESULT_MASK;
> > > +	local_irq_restore(flags);
> > > +
> > > +	*entry = output->interrupt_entry;
> > > +
> > > +	return status;
> >
> > As a cross-check, I was comparing this code against hv_map_msi_interrupt().  They are
> > mostly parallel, though some of the assignments are done in a different order.  It's a nit,
> > but making them as parallel as possible would be nice. :-)
> >
> 
> Indeed. I will see about factoring out a function.

If factoring out a separate helper function is clumsy, just having the parallel code
in the two functions be as similar as possible makes it easier to see what's the
same and what's different.

> 
> > Same 64 vCPU comment applies here as well.
> >
> 
> This is changed to use vpset instead. Took me a bit of time to get it
> working because document is a bit lacking.
> 
> >
> > > +}
> > > +EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt);
> > > diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> > > index ccc849e25d5e..345d7c6f8c37 100644
> > > --- a/arch/x86/include/asm/mshyperv.h
> > > +++ b/arch/x86/include/asm/mshyperv.h
> > > @@ -263,6 +263,10 @@ static inline void hv_set_msi_entry_from_desc(union
> > > hv_msi_entry *msi_entry,
> > >
> > >  struct irq_domain *hv_create_pci_msi_domain(void);
> > >
> > > +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
> > > +		struct hv_interrupt_entry *entry);
> > > +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
> > > +
> > >  #else /* CONFIG_HYPERV */
> > >  static inline void hyperv_init(void) {}
> > >  static inline void hyperv_setup_mmu_ops(void) {}
> > > diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> > > index b7db6024e65c..6d35e4c303c6 100644
> > > --- a/drivers/iommu/hyperv-iommu.c
> > > +++ b/drivers/iommu/hyperv-iommu.c
> > > @@ -116,30 +116,43 @@ static const struct irq_domain_ops hyperv_ir_domain_ops = {
> > >  	.free = hyperv_irq_remapping_free,
> > >  };
> > >
> > > +static const struct irq_domain_ops hyperv_root_ir_domain_ops;
> > >  static int __init hyperv_prepare_irq_remapping(void)
> > >  {
> > >  	struct fwnode_handle *fn;
> > >  	int i;
> > > +	const char *name;
> > > +	const struct irq_domain_ops *ops;
> > >
> > >  	if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
> > >  	    x86_init.hyper.msi_ext_dest_id() ||
> > > -	    !x2apic_supported() || hv_root_partition)
> > > +	    !x2apic_supported())
> >
> > Any reason that the check for hv_root_partition was added
> > in patch #4  of this series, and then removed here?  Could
> > patch #4 just be dropped?
> >
> 
> Before v5 (or v4?) IO-APIC was not handled via Hyper-V IOMMU. Now it is.
> 
> Patch 4 has become redundant with that change. I already dropped patch 4
> in the v6 branch I have locally.
> 
> > >  		return -ENODEV;
> > >
> > > -	fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
> > > +	if (hv_root_partition) {
> > > +		name = "HYPERV-ROOT-IR";
> > > +		ops = &hyperv_root_ir_domain_ops;
> > > +	} else {
> > > +		name = "HYPERV-IR";
> > > +		ops = &hyperv_ir_domain_ops;
> > > +	}
> > > +
> [...]
> > > +static void
> > > +hyperv_root_ir_compose_msi_msg(struct irq_data *irq_data, struct msi_msg *msg)
> > > +{
> > > +	u16 status;
> > > +	u32 vector;
> > > +	struct irq_cfg *cfg;
> > > +	int ioapic_id;
> > > +	struct cpumask *affinity;
> > > +	int cpu, vcpu;
> > > +	struct hv_interrupt_entry entry;
> > > +	struct hyperv_root_ir_data *data = irq_data->chip_data;
> > > +	struct IO_APIC_route_entry e;
> > > +
> > > +	cfg = irqd_cfg(irq_data);
> > > +	affinity = irq_data_get_effective_affinity_mask(irq_data);
> > > +	cpu = cpumask_first_and(affinity, cpu_online_mask);
> > > +	vcpu = hv_cpu_number_to_vp_number(cpu);
> > > +
> > > +	vector = cfg->vector;
> > > +	ioapic_id = data->ioapic_id;
> > > +
> > > +	if (data->entry.source == HV_DEVICE_TYPE_IOAPIC
> >
> > Does 'data' need to be checked to be non-NULL?  The parallel code in
> > hv_irq_compose_msi_msg() makes such a check.
> 
> The usage of irq_data->chip_data is different in these two functions.
> 
> In this function, we're sure it is correctly allocated by
> hyperv_root_ir_remapping_alloc at some point before.
> 
> In hv_irq_compose_msi_msg, irq_data->chip_data is instead used as a
> temporary place to stash some state that is controlled solely by the
> said function.
> 
> Once we get to the point of introducing a paravirtualized IOMMU for the
> root partition, we can then unify these two paths.

OK, thanks for the explanation.

> 
> >
> > > +	    && data->entry.ioapic_rte.as_uint64) {
> > > +		entry = data->entry;
> > > +
> > > +		status = hv_unmap_ioapic_interrupt(ioapic_id, &entry);
> > > +
> > > +		if (status != HV_STATUS_SUCCESS)
> > > +			pr_debug("%s: unexpected unmap status %d\n", __func__,
> status);
> > > +
> > > +		data->entry.ioapic_rte.as_uint64 = 0;
> > > +		data->entry.source = 0; /* Invalid source */
> >
> > Again comparing, hv_irq_compose_msi_msg() frees the old
> > entry, and then allocates a new one.   This code reuses the old entry.
> > Any reason for the difference?
> >
> 
> See above.
> 
> I can perhaps tweak the logic a bit to reuse the same entry, but the
> overall design won't change. I opted to always reallocate because that
> looked more straight-forward to me.
> 
> Let me know if you feel strongly about reusing.

I don't feel strongly about reusing.  I was just comparing/contrasting
the two functions.

> 
> > > +	}
> > > +
> > > +
> > > +	status = hv_map_ioapic_interrupt(ioapic_id, data->is_level, vcpu,
> > > +					vector, &entry);
> > > +
> > > +	if (status != HV_STATUS_SUCCESS) {
> > > +		pr_err("%s: map hypercall failed, status %d\n", __func__, status);
> > > +		return;
> > > +	}
> > > +
> > > +	data->entry = entry;
> > > +
> > > +	/* Turn it into an IO_APIC_route_entry, and generate MSI MSG. */
> > > +	e.w1 = entry.ioapic_rte.low_uint32;
> > > +	e.w2 = entry.ioapic_rte.high_uint32;
> > > +
> > > +	memset(msg, 0, sizeof(*msg));
> > > +	msg->arch_data.vector = e.vector;
> > > +	msg->arch_data.delivery_mode = e.delivery_mode;
> > > +	msg->arch_addr_lo.dest_mode_logical = e.dest_mode_logical;
> > > +	msg->arch_addr_lo.dmar_format = e.ir_format;
> > > +	msg->arch_addr_lo.dmar_index_0_14 = e.ir_index_0_14;
> > > +}
> >
> > Having this whole function be more parallel to hv_irq_compose_msi_msg()
> > would be nice. :-)
> >
> 
> Unlike hv_map_ioapic_interrupt and hv_map_msi_interrupt, which can
> benefit from unifying now, this and hv_irq_compose_msi_msg will need to
> wait till we have an IOMMU for the reason I stated above.

OK.  Just having the code in the two functions be more parallel where
possible would make it easier to see similarities and differences.  But
it's not a big deal.

> 
> > > +
> > > +static int hyperv_root_ir_set_affinity(struct irq_data *data,
> > > +		const struct cpumask *mask, bool force)
> > > +{
> > > +	struct irq_data *parent = data->parent_data;
> > > +	struct irq_cfg *cfg = irqd_cfg(data);
> > > +	int ret;
> > > +
> > > +	ret = parent->chip->irq_set_affinity(parent, mask, force);
> > > +	if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
> > > +		return ret;
> > > +
> > > +	send_cleanup_vector(cfg);
> > > +
> > > +	return 0;
> > > +}
> > > +
> [...]
> > > +
> > > +static void hyperv_root_irq_remapping_free(struct irq_domain *domain,
> > > +				 unsigned int virq, unsigned int nr_irqs)
> > > +{
> > > +	struct irq_data *irq_data;
> > > +	struct hyperv_root_ir_data *data;
> > > +	struct hv_interrupt_entry *e;
> > > +	int i;
> > > +
> > > +	for (i = 0; i < nr_irqs; i++) {
> > > +		irq_data = irq_domain_get_irq_data(domain, virq + i);
> > > +
> > > +		if (irq_data && irq_data->chip_data) {
> > > +			data = irq_data->chip_data;
> >
> > Set irq_data->chip_data to NULL?  That seems to be done in other
> > similar places in your code.
> 
> There is no need to do that. By the time this function returns, irq_data
> will be gone too -- freed by irq_domain_free_irqs_common.

OK

> 
> >
> > > +			e = &data->entry;
> > > +
> > > +			if (e->source == HV_DEVICE_TYPE_IOAPIC
> > > +			      && e->ioapic_rte.as_uint64)
> > > +				hv_unmap_ioapic_interrupt(data->ioapic_id,
> > > +							&data->entry);
> > > +
> > > +			kfree(data);
> > > +		}
> > > +	}
> > > +
> > > +	irq_domain_free_irqs_common(domain, virq, nr_irqs);
> > > +}
> > > +
> > > +static const struct irq_domain_ops hyperv_root_ir_domain_ops = {
> > > +	.select = hyperv_irq_remapping_select,
> > > +	.alloc = hyperv_root_irq_remapping_alloc,
> > > +	.free = hyperv_root_irq_remapping_free,
> > > +};
> > > +
> > >  #endif
> > > --
> > > 2.20.1
> >
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition
  2021-02-04 16:41       ` Michael Kelley via iommu
@ 2021-02-04 16:48         ` Wei Liu
  0 siblings, 0 replies; 8+ messages in thread
From: Wei Liu @ 2021-02-04 16:48 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Wei Liu, Stephen Hemminger, pasha.tatashin,
	Linux on Hyper-V List, Will Deacon, Haiyang Zhang,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Linux Kernel List, virtualization, open list:IOMMU DRIVERS,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Nuno Das Neves,
	Sunil Muthuswamy, KY Srinivasan, Vineeth Pillai, Thomas Gleixner

On Thu, Feb 04, 2021 at 04:41:47PM +0000, Michael Kelley wrote:
> From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, February 3, 2021 4:47 AM
[...]
> > > > +
> > > > +	if (level)
> > > > +		intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL;
> > > > +	else
> > > > +		intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> > > > +
> > > > +	__set_bit(vcpu, (unsigned long *)&intr_desc->target.vp_mask);
> > > > +
> > > > +	status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, input,
> > output) &
> > > > +			 HV_HYPERCALL_RESULT_MASK;
> > > > +	local_irq_restore(flags);
> > > > +
> > > > +	*entry = output->interrupt_entry;
> > > > +
> > > > +	return status;
> > >
> > > As a cross-check, I was comparing this code against hv_map_msi_interrupt().  They are
> > > mostly parallel, though some of the assignments are done in a different order.  It's a nit,
> > > but making them as parallel as possible would be nice. :-)
> > >
> > 
> > Indeed. I will see about factoring out a function.
> 
> If factoring out a separate helper function is clumsy, just having the parallel code
> in the two functions be as similar as possible makes it easier to see what's the
> same and what's different.
> 

No. It is not clumsy at all. I've done it in the newly posted v6.

I was baffled why I wrote hv_unmap_interrupt helper to be used by both
hv_unmap_ioapic_interrupt and hv_unmap_msi_interrupt in the previous
patch, but didn't write a hv_map_interrupt. Maybe I didn't have enough
coffee that day. :-/

Thanks for pointing out that issue. It definitely helped improve the
quality of this series.

Wei.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-02-04 16:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20210120120058.29138-1-wei.liu@kernel.org>
2021-01-20 12:00 ` [PATCH v5 04/16] iommu/hyperv: don't setup IRQ remapping when running as root Wei Liu
2021-01-20 16:08   ` Pavel Tatashin
2021-01-26  0:33   ` Michael Kelley via iommu
2021-01-20 12:00 ` [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition Wei Liu
2021-01-27  5:47   ` Michael Kelley via iommu
2021-02-03 12:47     ` Wei Liu
2021-02-04 16:41       ` Michael Kelley via iommu
2021-02-04 16:48         ` Wei Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).