All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: David Woodhouse <dwmw2@infradead.org>, x86@kernel.org
Cc: iommu <iommu@lists.linux-foundation.org>,
	kvm <kvm@vger.kernel.org>,
	linux-hyperv@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()
Date: Thu, 08 Oct 2020 14:40:57 +0200	[thread overview]
Message-ID: <878scgx546.fsf@nanos.tec.linutronix.de> (raw)
In-Reply-To: <2c0712352812ab114cb711236703fd7c308a5bf2.camel@infradead.org>

On Thu, Oct 08 2020 at 12:10, David Woodhouse wrote:
> On Thu, 2020-10-08 at 11:34 +0200, Thomas Gleixner wrote:
>> The overall conclusion for this is:
>> 
>>  1) X2APIC support on bare metal w/o irq remapping is not going to
>>     happen unless you:
>> 
>>       - added support in multi-queue devices which utilize managed
>>         interrupts
>>         
>>       - audited the whole tree for other assumptions related to the
>>         reachability of possible CPUs.
>> 
>>     I'm not expecting you to be done with that before I retire so for
>>     me it's just not going to happen :)
>
> Makes sense. It probably does mean we should a BUG_ON for the case
> where IRQ remapping *is* enabled but any device is found which isn't
> behind it. But that's OK.

We can kinda gracefully handle that. See the completely untested and
incomplete patch below.

>>  2) X2APIC support on VIRT is possible if the extended ID magic is
>>     supported by the hypervisor because that does not make any CPU
>>     unreachable for MSI and therefore the multi-queue muck and
>>     everything else just works.
>> 
>>     This requires to have either the domain affinity limitation for HPET
>>     in place or just to force disable HPET or at least HPET-MSI which is
>>     a reasonable tradeoff.
>> 
>>     HPET is not required for guests which have kvmclock and
>>     APIC/deadline timer and known (hypervisor provided) frequencies.
>
> HPET-MSI should work fine. Like the IOAPIC, it's just a child of the
> *actual* MSI domain. The address/data in the MSI message are completely
> opaque to it, and if the parent domain happens to put meaningful
> information into bits 11-5 of the MSI address, the HPET won't even
> notice.
>
> The HPET's Tn_FSB_INT_ADDR register does have a full 32 bits of the MSI
> address; it's not doing bit-swizzling like the IOAPIC does, which might
> potentially *not* have been able to set certain bits in the MSI.

Indeed. I thought it was crippled in some way, but you're right it has
all the bits.

Thanks,

        tglx
---
Subject: x86/iommu: Make interrupt remapping more robust
From: Thomas Gleixner <tglx@linutronix.de>
Date: Thu, 08 Oct 2020 14:09:44 +0200

Needs to be split into pieces and cover PCI proper. Right now PCI gets a
NULL pointer assigned which makes it explode at the wrong place
later. Also hyperv iommu wants some love.

NOT-Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/apic/io_apic.c      |    4 +++-
 arch/x86/kernel/apic/msi.c          |   24 ++++++++++++++----------
 drivers/iommu/amd/iommu.c           |    6 +++---
 drivers/iommu/intel/irq_remapping.c |    4 ++--
 4 files changed, 22 insertions(+), 16 deletions(-)

--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2300,7 +2300,9 @@ static int mp_irqdomain_create(int ioapi
 	info.type = X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT;
 	info.devid = mpc_ioapic_id(ioapic);
 	parent = irq_remapping_get_irq_domain(&info);
-	if (!parent)
+	if (IS_ERR(parent))
+		return PTR_ERR(parent);
+	else if (!parent)
 		parent = x86_vector_domain;
 	else
 		name = "IO-APIC-IR";
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -415,9 +415,9 @@ static struct msi_domain_info hpet_msi_d
 struct irq_domain *hpet_create_irq_domain(int hpet_id)
 {
 	struct msi_domain_info *domain_info;
+	struct fwnode_handle *fn = NULL;
 	struct irq_domain *parent, *d;
 	struct irq_alloc_info info;
-	struct fwnode_handle *fn;
 
 	if (x86_vector_domain == NULL)
 		return NULL;
@@ -432,25 +432,29 @@ struct irq_domain *hpet_create_irq_domai
 	init_irq_alloc_info(&info, NULL);
 	info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
 	info.devid = hpet_id;
+
 	parent = irq_remapping_get_irq_domain(&info);
-	if (parent == NULL)
+	if (IS_ERR(parent))
+		goto fail;
+	else if (!parent)
 		parent = x86_vector_domain;
 	else
 		hpet_msi_controller.name = "IR-HPET-MSI";
 
 	fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
 					      hpet_id);
-	if (!fn) {
-		kfree(domain_info);
-		return NULL;
-	}
+	if (!fn)
+		goto fail;
 
 	d = msi_create_irq_domain(fn, domain_info, parent);
-	if (!d) {
-		irq_domain_free_fwnode(fn);
-		kfree(domain_info);
-	}
+	if (!d)
+		goto fail;
 	return d;
+
+fail:
+	irq_domain_free_fwnode(fn);
+	kfree(domain_info);
+	return NULL;
 }
 
 int hpet_assign_irq(struct irq_domain *domain, struct hpet_channel *hc,
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3557,7 +3557,7 @@ static struct irq_domain *get_irq_domain
 	struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
 
 	if (!iommu)
-		return NULL;
+		return ERR_PTR(-ENODEV);
 
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
@@ -3565,7 +3565,7 @@ static struct irq_domain *get_irq_domain
 		return iommu->ir_domain;
 	default:
 		WARN_ON_ONCE(1);
-		return NULL;
+		return ERR_PTR(-ENODEV);
 	}
 }
 
@@ -3578,7 +3578,7 @@ static struct irq_domain *get_irq_domain
 
 	devid = get_devid(info);
 	if (devid < 0)
-		return NULL;
+		return ERR_PTR(-ENODEV);
 	return get_irq_domain_for_devid(info, devid);
 }
 
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -212,7 +212,7 @@ static struct irq_domain *map_hpet_to_ir
 		if (ir_hpet[i].id == hpet_id && ir_hpet[i].iommu)
 			return ir_hpet[i].iommu->ir_domain;
 	}
-	return NULL;
+	return ERR_PTR(-ENODEV);
 }
 
 static struct intel_iommu *map_ioapic_to_iommu(int apic)
@@ -230,7 +230,7 @@ static struct irq_domain *map_ioapic_to_
 {
 	struct intel_iommu *iommu = map_ioapic_to_iommu(apic);
 
-	return iommu ? iommu->ir_domain : NULL;
+	return iommu ? iommu->ir_domain : ERR_PTR(-ENODEV);
 }
 
 static struct irq_domain *map_dev_to_ir(struct pci_dev *dev)

WARNING: multiple messages have this Message-ID (diff)
From: Thomas Gleixner <tglx@linutronix.de>
To: David Woodhouse <dwmw2@infradead.org>, x86@kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	iommu <iommu@lists.linux-foundation.org>,
	linux-hyperv@vger.kernel.org, kvm <kvm@vger.kernel.org>
Subject: Re: [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()
Date: Thu, 08 Oct 2020 14:40:57 +0200	[thread overview]
Message-ID: <878scgx546.fsf@nanos.tec.linutronix.de> (raw)
In-Reply-To: <2c0712352812ab114cb711236703fd7c308a5bf2.camel@infradead.org>

On Thu, Oct 08 2020 at 12:10, David Woodhouse wrote:
> On Thu, 2020-10-08 at 11:34 +0200, Thomas Gleixner wrote:
>> The overall conclusion for this is:
>> 
>>  1) X2APIC support on bare metal w/o irq remapping is not going to
>>     happen unless you:
>> 
>>       - added support in multi-queue devices which utilize managed
>>         interrupts
>>         
>>       - audited the whole tree for other assumptions related to the
>>         reachability of possible CPUs.
>> 
>>     I'm not expecting you to be done with that before I retire so for
>>     me it's just not going to happen :)
>
> Makes sense. It probably does mean we should a BUG_ON for the case
> where IRQ remapping *is* enabled but any device is found which isn't
> behind it. But that's OK.

We can kinda gracefully handle that. See the completely untested and
incomplete patch below.

>>  2) X2APIC support on VIRT is possible if the extended ID magic is
>>     supported by the hypervisor because that does not make any CPU
>>     unreachable for MSI and therefore the multi-queue muck and
>>     everything else just works.
>> 
>>     This requires to have either the domain affinity limitation for HPET
>>     in place or just to force disable HPET or at least HPET-MSI which is
>>     a reasonable tradeoff.
>> 
>>     HPET is not required for guests which have kvmclock and
>>     APIC/deadline timer and known (hypervisor provided) frequencies.
>
> HPET-MSI should work fine. Like the IOAPIC, it's just a child of the
> *actual* MSI domain. The address/data in the MSI message are completely
> opaque to it, and if the parent domain happens to put meaningful
> information into bits 11-5 of the MSI address, the HPET won't even
> notice.
>
> The HPET's Tn_FSB_INT_ADDR register does have a full 32 bits of the MSI
> address; it's not doing bit-swizzling like the IOAPIC does, which might
> potentially *not* have been able to set certain bits in the MSI.

Indeed. I thought it was crippled in some way, but you're right it has
all the bits.

Thanks,

        tglx
---
Subject: x86/iommu: Make interrupt remapping more robust
From: Thomas Gleixner <tglx@linutronix.de>
Date: Thu, 08 Oct 2020 14:09:44 +0200

Needs to be split into pieces and cover PCI proper. Right now PCI gets a
NULL pointer assigned which makes it explode at the wrong place
later. Also hyperv iommu wants some love.

NOT-Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/apic/io_apic.c      |    4 +++-
 arch/x86/kernel/apic/msi.c          |   24 ++++++++++++++----------
 drivers/iommu/amd/iommu.c           |    6 +++---
 drivers/iommu/intel/irq_remapping.c |    4 ++--
 4 files changed, 22 insertions(+), 16 deletions(-)

--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2300,7 +2300,9 @@ static int mp_irqdomain_create(int ioapi
 	info.type = X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT;
 	info.devid = mpc_ioapic_id(ioapic);
 	parent = irq_remapping_get_irq_domain(&info);
-	if (!parent)
+	if (IS_ERR(parent))
+		return PTR_ERR(parent);
+	else if (!parent)
 		parent = x86_vector_domain;
 	else
 		name = "IO-APIC-IR";
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -415,9 +415,9 @@ static struct msi_domain_info hpet_msi_d
 struct irq_domain *hpet_create_irq_domain(int hpet_id)
 {
 	struct msi_domain_info *domain_info;
+	struct fwnode_handle *fn = NULL;
 	struct irq_domain *parent, *d;
 	struct irq_alloc_info info;
-	struct fwnode_handle *fn;
 
 	if (x86_vector_domain == NULL)
 		return NULL;
@@ -432,25 +432,29 @@ struct irq_domain *hpet_create_irq_domai
 	init_irq_alloc_info(&info, NULL);
 	info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
 	info.devid = hpet_id;
+
 	parent = irq_remapping_get_irq_domain(&info);
-	if (parent == NULL)
+	if (IS_ERR(parent))
+		goto fail;
+	else if (!parent)
 		parent = x86_vector_domain;
 	else
 		hpet_msi_controller.name = "IR-HPET-MSI";
 
 	fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
 					      hpet_id);
-	if (!fn) {
-		kfree(domain_info);
-		return NULL;
-	}
+	if (!fn)
+		goto fail;
 
 	d = msi_create_irq_domain(fn, domain_info, parent);
-	if (!d) {
-		irq_domain_free_fwnode(fn);
-		kfree(domain_info);
-	}
+	if (!d)
+		goto fail;
 	return d;
+
+fail:
+	irq_domain_free_fwnode(fn);
+	kfree(domain_info);
+	return NULL;
 }
 
 int hpet_assign_irq(struct irq_domain *domain, struct hpet_channel *hc,
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3557,7 +3557,7 @@ static struct irq_domain *get_irq_domain
 	struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
 
 	if (!iommu)
-		return NULL;
+		return ERR_PTR(-ENODEV);
 
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
@@ -3565,7 +3565,7 @@ static struct irq_domain *get_irq_domain
 		return iommu->ir_domain;
 	default:
 		WARN_ON_ONCE(1);
-		return NULL;
+		return ERR_PTR(-ENODEV);
 	}
 }
 
@@ -3578,7 +3578,7 @@ static struct irq_domain *get_irq_domain
 
 	devid = get_devid(info);
 	if (devid < 0)
-		return NULL;
+		return ERR_PTR(-ENODEV);
 	return get_irq_domain_for_devid(info, devid);
 }
 
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -212,7 +212,7 @@ static struct irq_domain *map_hpet_to_ir
 		if (ir_hpet[i].id == hpet_id && ir_hpet[i].iommu)
 			return ir_hpet[i].iommu->ir_domain;
 	}
-	return NULL;
+	return ERR_PTR(-ENODEV);
 }
 
 static struct intel_iommu *map_ioapic_to_iommu(int apic)
@@ -230,7 +230,7 @@ static struct irq_domain *map_ioapic_to_
 {
 	struct intel_iommu *iommu = map_ioapic_to_iommu(apic);
 
-	return iommu ? iommu->ir_domain : NULL;
+	return iommu ? iommu->ir_domain : ERR_PTR(-ENODEV);
 }
 
 static struct irq_domain *map_dev_to_ir(struct pci_dev *dev)
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply	other threads:[~2020-10-08 12:41 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-05 15:28 [PATCH 0/13] Fix per-domain IRQ affinity, allow >255 CPUs on x86 without IRQ remapping David Woodhouse
2020-10-05 15:28 ` David Woodhouse
2020-10-05 15:28 ` [PATCH 01/13] x86/apic: Use x2apic in guest kernels even with unusable CPUs David Woodhouse
2020-10-05 15:28   ` David Woodhouse
2020-10-05 15:28   ` [PATCH 02/13] x86/msi: Only use high bits of MSI address for DMAR unit David Woodhouse
2020-10-05 15:28     ` David Woodhouse
2020-10-06 20:45     ` Thomas Gleixner
2020-10-06 20:45       ` Thomas Gleixner
2020-10-05 15:28   ` [PATCH 03/13] x86/ioapic: Handle Extended Destination ID field in RTE David Woodhouse
2020-10-05 15:28     ` David Woodhouse
2020-10-05 15:28   ` [PATCH 04/13] x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available David Woodhouse
2020-10-05 15:28     ` David Woodhouse
2020-10-05 15:28   ` [PATCH 05/13] genirq: Prepare for default affinity to be passed to __irq_alloc_descs() David Woodhouse
2020-10-05 15:28     ` David Woodhouse
2020-10-06 21:01     ` Thomas Gleixner
2020-10-06 21:01       ` Thomas Gleixner
2020-10-06 21:07       ` David Woodhouse
2020-10-06 21:07         ` David Woodhouse
2020-10-05 15:28   ` [PATCH 06/13] genirq: Add default_affinity argument " David Woodhouse
2020-10-05 15:28     ` David Woodhouse
2020-10-06 21:06     ` Thomas Gleixner
2020-10-06 21:06       ` Thomas Gleixner
2020-10-05 15:28   ` [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs() David Woodhouse
2020-10-05 15:28     ` David Woodhouse
2020-10-06 21:26     ` Thomas Gleixner
2020-10-06 21:26       ` Thomas Gleixner
2020-10-07  7:19       ` David Woodhouse
2020-10-07  7:19         ` David Woodhouse
2020-10-07 13:37         ` Thomas Gleixner
2020-10-07 13:37           ` Thomas Gleixner
2020-10-07 14:10           ` David Woodhouse
2020-10-07 14:10             ` David Woodhouse
2020-10-07 15:57             ` Thomas Gleixner
2020-10-07 15:57               ` Thomas Gleixner
2020-10-07 16:11               ` David Woodhouse
2020-10-07 16:11                 ` David Woodhouse
2020-10-07 20:53                 ` Thomas Gleixner
2020-10-07 20:53                   ` Thomas Gleixner
2020-10-08  7:21               ` David Woodhouse
2020-10-08  7:21                 ` David Woodhouse
2020-10-08  9:34                 ` Thomas Gleixner
2020-10-08  9:34                   ` Thomas Gleixner
2020-10-08 11:10                   ` David Woodhouse
2020-10-08 11:10                     ` David Woodhouse
2020-10-08 12:40                     ` Thomas Gleixner [this message]
2020-10-08 12:40                       ` Thomas Gleixner
2020-10-09  7:54                       ` David Woodhouse
2020-10-09  7:54                         ` David Woodhouse
2020-10-05 15:28   ` [PATCH 08/13] genirq: Add irq_domain_set_affinity() David Woodhouse
2020-10-05 15:28     ` David Woodhouse
2020-10-06 21:32     ` Thomas Gleixner
2020-10-06 21:32       ` Thomas Gleixner
2020-10-07  7:22       ` David Woodhouse
2020-10-07  7:22         ` David Woodhouse
2020-10-05 15:28   ` [PATCH 09/13] x86/irq: Add x86_non_ir_cpumask David Woodhouse
2020-10-05 15:28     ` David Woodhouse
2020-10-06 21:42     ` Thomas Gleixner
2020-10-06 21:42       ` Thomas Gleixner
2020-10-07  7:25       ` David Woodhouse
2020-10-07  7:25         ` David Woodhouse
2020-10-05 15:28   ` [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR David Woodhouse
2020-10-05 15:28     ` David Woodhouse
2020-10-06 21:54     ` Thomas Gleixner
2020-10-06 21:54       ` Thomas Gleixner
2020-10-07  7:48       ` David Woodhouse
2020-10-07  7:48         ` David Woodhouse
2020-10-07 12:59         ` Thomas Gleixner
2020-10-07 12:59           ` Thomas Gleixner
2020-10-07 13:08           ` David Woodhouse
2020-10-07 13:08             ` David Woodhouse
2020-10-07 14:05             ` Thomas Gleixner
2020-10-07 14:05               ` Thomas Gleixner
2020-10-07 14:23               ` David Woodhouse
2020-10-07 14:23                 ` David Woodhouse
2020-10-07 16:02                 ` Thomas Gleixner
2020-10-07 16:02                   ` Thomas Gleixner
2020-10-07 16:15                   ` David Woodhouse
2020-10-07 16:15                     ` David Woodhouse
2020-10-07 15:05               ` David Woodhouse
2020-10-07 15:05                 ` David Woodhouse
2020-10-07 15:25                 ` Thomas Gleixner
2020-10-07 15:25                   ` Thomas Gleixner
2020-10-07 15:46                   ` David Woodhouse
2020-10-07 15:46                     ` David Woodhouse
2020-10-07 17:23                     ` Thomas Gleixner
2020-10-07 17:23                       ` Thomas Gleixner
2020-10-07 17:34                       ` David Woodhouse
2020-10-07 17:34                         ` David Woodhouse
2020-10-05 15:28   ` [PATCH 11/13] x86/smp: Allow more than 255 CPUs even without interrupt remapping David Woodhouse
2020-10-05 15:28     ` David Woodhouse
2020-10-05 15:28   ` [PATCH 12/13] iommu/irq_remapping: Kill most of hyperv-iommu.c now it's redundant David Woodhouse
2020-10-05 15:28     ` David Woodhouse
2020-10-05 15:28   ` [PATCH 13/13] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID David Woodhouse
2020-10-05 15:28     ` David Woodhouse
2020-10-07  8:14     ` Paolo Bonzini
2020-10-07  8:14       ` Paolo Bonzini
2020-10-07  8:59       ` David Woodhouse
2020-10-07  8:59         ` David Woodhouse
2020-10-07 11:15         ` Paolo Bonzini
2020-10-07 11:15           ` Paolo Bonzini
2020-10-07 12:04           ` David Woodhouse
2020-10-07 12:04             ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878scgx546.fsf@nanos.tec.linutronix.de \
    --to=tglx@linutronix.de \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.