Linux-HyperV Archive on lore.kernel.org
 help / color / Atom feed
From: David Woodhouse <dwmw2@infradead.org>
To: Thomas Gleixner <tglx@linutronix.de>, x86@kernel.org
Cc: iommu <iommu@lists.linux-foundation.org>,
	kvm <kvm@vger.kernel.org>,
	linux-hyperv@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR
Date: Wed, 07 Oct 2020 16:46:10 +0100
Message-ID: <2f09a1f97d97e638e90c6eca3ebeebb4be852f58.camel@infradead.org> (raw)
In-Reply-To: <87d01u3vo6.fsf@nanos.tec.linutronix.de>


[-- Attachment #1: Type: text/plain, Size: 2879 bytes --]

On Wed, 2020-10-07 at 17:25 +0200, Thomas Gleixner wrote:
> It's clearly how the hardware works. MSI has a message store of some
> sorts and if the entry is enabled then the MSI chip (in PCI or
> elsewhere) will send exactly the message which is in that message
> store. It knows absolutely nothing about what the message means and how
> it is composed. The only things which MSI knows about is whether the
> message address is 64bit wide or not and whether the entries are
> maskable or not and how many entries it can store.
> 
> Software allocates a message target at the underlying irq domain (vector
> or remap) and that underlying irq domain defines the properties.
> 
> If qemu emulates it differently then it's qemu's problem, but that does
> not make it in anyway something which influences the irq domain
> abstractions which are correctly modeled after how the hardware works.
> 
> > Not really the important part to deal with right now, either way.
> 
> Oh yes it is. We define that upfront and not after the fact.

The way the hardware works is that something handles physical address
cycles to addresses in the (on x86) 0xFEExxxxx range, and turns them
into actual interrupts on the appropriate CPU — where the APIC ID and
vector (etc.) are directly encoded in the bits of the address and the
data written. That compatibility x86 APIC MSI format is where the 8-bit 
(or 15-bit) limit comes from.

Then interrupt remapping comes along, and now those physical address
cycles are actually handled by the IOMMU — which can either handle the
compatibility format as before, or use a different format of
address/data bits and perform a lookup in its IRTE table.

The PCI MSI domain, HPET, and even the IOAPIC are just the things out
there on the bus which might perform those physical address cycles. And
yes, as you say they're just a message store sending exactly the
message that was composed for them. They know absolutely nothing about
what the message means and how it is composed.

It so happens that in Linux, we don't really architect the software
like that. So each of the PCI MSI domain, HPET, and IOAPIC have their
*own* message composer which has the same limits and composes basically
the same messages as if it was *their* format, not dictated to them by
the APIC upstream. And that's what we're both getting our panties in a
knot about, I think.

It really doesn't matter that much to the underlying generic irqdomain
support for limited affinities. Except that you want to make the
generic code support the concept of a child domain supporting *more*
CPUs than its parent, which really doesn't make much sense if you think
about it. But it isn't that hard to do, and if it means we don't have
to argue any more about the x86 hierarchy not matching the hardware
then it's a price worth paying. 



[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

  reply index

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-05 15:28 [PATCH 0/13] Fix per-domain IRQ affinity, allow >255 CPUs on x86 without IRQ remapping David Woodhouse
2020-10-05 15:28 ` [PATCH 01/13] x86/apic: Use x2apic in guest kernels even with unusable CPUs David Woodhouse
2020-10-05 15:28   ` [PATCH 02/13] x86/msi: Only use high bits of MSI address for DMAR unit David Woodhouse
2020-10-06 20:45     ` Thomas Gleixner
2020-10-05 15:28   ` [PATCH 03/13] x86/ioapic: Handle Extended Destination ID field in RTE David Woodhouse
2020-10-05 15:28   ` [PATCH 04/13] x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available David Woodhouse
2020-10-05 15:28   ` [PATCH 05/13] genirq: Prepare for default affinity to be passed to __irq_alloc_descs() David Woodhouse
2020-10-06 21:01     ` Thomas Gleixner
2020-10-06 21:07       ` David Woodhouse
2020-10-05 15:28   ` [PATCH 06/13] genirq: Add default_affinity argument " David Woodhouse
2020-10-06 21:06     ` Thomas Gleixner
2020-10-05 15:28   ` [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs() David Woodhouse
2020-10-06 21:26     ` Thomas Gleixner
2020-10-07  7:19       ` David Woodhouse
2020-10-07 13:37         ` Thomas Gleixner
2020-10-07 14:10           ` David Woodhouse
2020-10-07 15:57             ` Thomas Gleixner
2020-10-07 16:11               ` David Woodhouse
2020-10-07 20:53                 ` Thomas Gleixner
2020-10-08  7:21               ` David Woodhouse
2020-10-08  9:34                 ` Thomas Gleixner
2020-10-08 11:10                   ` David Woodhouse
2020-10-08 12:40                     ` Thomas Gleixner
2020-10-09  7:54                       ` David Woodhouse
2020-10-05 15:28   ` [PATCH 08/13] genirq: Add irq_domain_set_affinity() David Woodhouse
2020-10-06 21:32     ` Thomas Gleixner
2020-10-07  7:22       ` David Woodhouse
2020-10-05 15:28   ` [PATCH 09/13] x86/irq: Add x86_non_ir_cpumask David Woodhouse
2020-10-06 21:42     ` Thomas Gleixner
2020-10-07  7:25       ` David Woodhouse
2020-10-05 15:28   ` [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR David Woodhouse
2020-10-06 21:54     ` Thomas Gleixner
2020-10-07  7:48       ` David Woodhouse
2020-10-07 12:59         ` Thomas Gleixner
2020-10-07 13:08           ` David Woodhouse
2020-10-07 14:05             ` Thomas Gleixner
2020-10-07 14:23               ` David Woodhouse
2020-10-07 16:02                 ` Thomas Gleixner
2020-10-07 16:15                   ` David Woodhouse
2020-10-07 15:05               ` David Woodhouse
2020-10-07 15:25                 ` Thomas Gleixner
2020-10-07 15:46                   ` David Woodhouse [this message]
2020-10-07 17:23                     ` Thomas Gleixner
2020-10-07 17:34                       ` David Woodhouse
2020-10-05 15:28   ` [PATCH 11/13] x86/smp: Allow more than 255 CPUs even without interrupt remapping David Woodhouse
2020-10-05 15:28   ` [PATCH 12/13] iommu/irq_remapping: Kill most of hyperv-iommu.c now it's redundant David Woodhouse
2020-10-05 15:28   ` [PATCH 13/13] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID David Woodhouse
2020-10-07  8:14     ` Paolo Bonzini
2020-10-07  8:59       ` David Woodhouse
2020-10-07 11:15         ` Paolo Bonzini
2020-10-07 12:04           ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2f09a1f97d97e638e90c6eca3ebeebb4be852f58.camel@infradead.org \
    --to=dwmw2@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-HyperV Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-hyperv/0 linux-hyperv/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-hyperv linux-hyperv/ https://lore.kernel.org/linux-hyperv \
		linux-hyperv@vger.kernel.org
	public-inbox-index linux-hyperv

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-hyperv


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git